Let me know if this helps. This is using MSSQL (T-SQL) syntax so you might have to adjust it for postgres, but, using a temp table called #tree, containing the two columns id and parent and populated as your example is with
SELECT 1, 0
UNION
SELECT 2, 1
UNION
SELECT 3, 1
UNION
SELECT 4, 2
UNION
SELECT 5, 2
UNION
SELECT 6, 4
UNION
SELECT 7, 6
UNION
SELECT 8, 6:
with cte as
(
select id as currentnode, id as root
from #tree
union all
select t.id as currentnode, cte.root
from #tree t join cte on t.parent = cte.currentnode
)
select *
from cte
order by root
The above recursive CTE will yield all nodes reachable starting from some root. You can do:
SELECT root, count(*)
from cte
group by root
The count(*) query should yield,
root | count
----------------
1 | 8
2 | 6
3 | 1
4 | 4
5 | 1
6 | 3
7 | 1
8 | 1
You could then subtract 1 from each of those to get the result you were hinting at. Is this what you were looking for? Even if not, I feel like a recursive CTE can probably help with what you need.
The first thing I will say here is that you really need to encapsulate this properly to make it work. Secondly this is a good case where breaking 1NF is likely to be a good thing performance-wise.
I would break this into three pieces, totally mangling Codd's rules in the process, but I think those can be broken in certain cases and here breaking one piece demands breaking more pieces in order to correct the damage.
The first piece is the administration portion. In this portion, permissions are stored in the form you are currently using. This is used for updating and managing the permissions in a relational way. These would then be compiled for a specific row using a stored procedure.
The second piece is the storage piece. Here the rows are stored with the roles in an array as you suggest in your first answer. The arrays here are indexed using GIN which makes lookups a lot faster.
Third, you need encapsulation logic, such as views to only retrieve the rows a user has access to, or a stored procedure interface which checks the roles adding WHERE my_role_id = ANY(roles)
to the end of the query.
To make this work you will need to compile the permissions when they are changed. However this structure allows you to say "all rows that Bob has access to can now be accessed by Joe."
CREATE TABLE foo (
id serial not null unique,
....
roles int[]
);
CREATE TABLE foo_roles (
foo_id int not null references foo(id),
role_id int not null references roles(id)
);
CREATE OR REPLACE FUNCTION rebuild_foo_roles(in_foo_id int) RETURNS int[]
LANGUAGE SQL AS $$
UPDATE foo SET roles = (select array_agg(role_id) from foo_roles
where foo_id = $1)
WHERE role_id = $1
RETURNING roles;
$$;
Then you'd update the permissions by altering the data in foo_roles and then calling rebuild_foo_roles
with the ids of the rows you want to modify.
Now one thing about this structure is that it makes lookups fast, but updates to the permissions pretty slow.
Update: A note about GIN indexes (per request).
Traditional database indexes are b-trees, which means you can search by comparing an expected value to branches of the index, and then traversing to leaves. This works well if searching either on the whole value (WHERE foo = 'my_fun_value') but it doesn't work well with more complex searches. GIN indexes (Generalized Index Method) allow for indexes to be used in a wider variety of searches, but they are slightly slower than b-tree indexes. GIN indexes can be used to ask questions like:
Is value x in array y? (our use case here)
Does circle A overlap with circle B?
Does line segment C cross line segment D?
Does appointment A overlap time-wise with appointment B?
In newer versions of PostgreSQL you can use these indexes to enforce constraints like "No appointments in any given room may overlap time-wise" but the main use case here is that it makes searching arrays to be fast.
In general a GIN index will give you much better performance than a b-tree or GIST index on a join. In this case your query against foo
can be an index scan against the relation with no joins needed.
Best Answer
Easy and very useful: you can launch psql with the proper switch (-E) to get the information.
Have a look at http://www.postgresql.org/docs/current/static/app-psql.html for more information. Once in psql, you could also set the ECHO_HIDDEN variable.