Imagine I have a table "departments" like this:
name TEXT
parent_department TEXT (nullable)
I also have a "budgets" table like this:
department_name TEXT
budget INTEGER
I need to "traverse" this departments table from "leaf to root", following the parent department column. The end result is that I need to sum up the budgets of each department, but the twist is that the "budgets" table that I'll be using at any given time is very sparse (10s to 100s of rows), while departments is very dense (100m rows).
Normally, I would approach this by creating a "with recursive" CTE that "flattens" the department relationship into a temporary table of just "root department", "leaf department". Then I would join that table on leaf department to budgets, and group by root. The problem is that the "department" table is FAR too big now (the analogy to departments breaks down here a little bit I admit, no company has >100m departments).
Concretely, if I had departments like this (child -> parent):
C -> B
B -> A
A -> NULL
E -> D
D -> NULL
F -> G
G -> NULL
And budgets like this:
A 1
C 3
E 5
I would want to get as output:
A 4
D 5
And ideally, F and G, for example, would never be touched, because there are hundreds of millions of rows like F and G that shouldn't come up. But in my current query, they do. How can I fix this so that I only "traverse" the departments that I actually need to?
Best Answer
It took a while, but I figured it out: