create table division (
id serial primary key,
name varchar not null
);
-- nested sets table
create table location (
id serial primary key,
name varchar not null,
division_id integer not null references division(id),
tree_id integer not null,
parent_id integer references location(id),
lft integer not null,
rgt integer not null,
level integer not null
);
create table report (
id serial primary key,
somevar_1 integer not null,
somevar_2 integer not null,
tyme timestamptz not null,
-- other stuff
location_id integer not null references location(id)
);
-- division hierarchy is Division0 > Division1 > Division2 > Division3
-- reports are generated at Division3, and aggregated at
-- either Division2 or Division1 (usually the latter)
-- this means that to get my data for Division1, i have to
-- do two self-joins on the location table
-- plus one for the report table
given the above schema, i'm experimenting with different ways to speed up querying for the sum of each somevar
aggregated at either Division2 or Division1. right now, what i do is
select
sum(somevar_1) as somevar_1, sum(somevar_2) as somevar_2,
div1.name, div1.id
from
report join location as div3 on report.location_id = div3.id
join
location as div2 on div3.parent_id = div2.id
join
location as div1 on div2.parent_id = div1.id
where
-- for example, all reports this year
date_trunc('year', tyme) = '2019-01-01'
group by div1.id, div1.name
the above query runs in about 4.8s locally with a row count a bit shy of 55k reports for 2019. this is just the base data, without any further processing. the processing step also does something similar, and it's in a web application, so the user is waiting for a not-insignificant amount of time.
for my question: can i use recursive CTEs to speed this up? i haven't yet grokked recursive CTEs, so writing one is still a bit beyond me at this point.
Best Answer
I would stick with one model. If you store both nested set information and parent information, you will have to maintain both, and you may end up with inconsistent information about your tree. If your main focus is on reporting both for div1 and div1, div2, you may consider
GROUPING SETS
. They are aUNION
in disguise:For the second grouping set (div1.id, div1.name) you will get null instead of div2.id, div2.name. That's why I used COALESCE to map the name to TOTAL.
You may also add a level of a
GRAND TOTAL
like:Grouping sets are the most general construction, but under certain situations (not yours), you can use
GROUP BY ROLLUP
andGROUP BY CUBE
which are a bit shorter. See:Grouping sets, rollup, and cube
If I get it right, you will always traverse div1, div2 and div3 levels in your queries and I therefor strongly doubt that a recursive CTE will perform better than 3 static JOINS. The power of RCTE is that it can traverse a variable number of joins. For a fixed number of levels, there is no performance gain.