Hierarchy Models – Adjacency List vs Nested Sets

firebirdhierarchytree

I've stumbled upon an article describing nested sets model for storing hierarchical data and was quite intrigued by the supposed performance boost.

after implementing this in our database, i've found that the adjecency list model performs far better. though the article I've read and a couple of SA posts (one, two) stated that any tree traversal is supposed to benefit, my tests demonstrate the opposite.

Am I missing something?

there are roughly 1.1 Million entries in the following table:

SQL> show table tax_nodes;
TAX_ID                          INTEGER Not Null
PARENT_TAX_ID                   INTEGER Nullable
RANK                            VARCHAR(50) Nullable
DIVISION_ID                     INTEGER Nullable
TREE_RIGHT                      INTEGER Nullable
TREE_LEFT                       INTEGER Nullable
CONSTRAINT INTEG_202:
  Primary key (TAX_ID)

Adjecency List Query to fetch parents

with recursive lineage as 
(select tax_id, parent_tax_id 
 from tax_nodes 
 where tax_id = 365612
 union ALL 
 select tax_id, parent_tax_id 
 from tax_nodes t 
 join lineage l on l.parent_tax_id = t.tax_id
) 
select l.tax_id
from lineage l

Nested Sets Query to fetch parents

select
  p2.tax_id from tax_nodes as p1,
  tax_nodes as p2
where
    p1.tree_left between p2.tree_left and p2.tree_right
    and p1.tax_id = 365612

Result

365612 336487 10260 10241 10240 35237 10239 1  //1 is the root

Why is the Adjecency List performing better though most resources suggest otherwise?

I'm running Firebird 2.5 superserver by the way.

UPDATE

On @MarkRotteveel's hint, I took a look at the execution plans and I suspect the NATURAL being the culprit here. Question now is, how do I get rid of it?

adjecency list execution plan

PLAN (LINEAGE TAX_NODES INDEX (RDB$PRIMARY108))
PLAN (LINEAGE T INDEX (RDB$PRIMARY108))

nested sets execution plan

PLAN JOIN (P1 INDEX (RDB$PRIMARY108), P2 NATURAL)

Best Answer

The execution plan of your adjacency list example shows that it is able to use the primary key index (RDB$PRIMARY108) for both parts of the UNION. This is very efficient, mainly because you start with a single value, and then recursively look up its descendents (which is probably just a small set of the entire set of records).

For the nested set execution plan it is not able to use an index, because it has to scan all records for the values of p2.tree_left and p2.tree_right. It can only use the primary key index for looking up p1.tax_id = 365612.

You might be able to improve execution if you add a descending index for tax_nodes.tree_left and an ascending index for tax_nodes.tree_right. You may need to swap ascending/descending because I usually do it wrong initially, even if I account for usually doing it wrong. However you may also need to change the query condition so the optimizer picks the indexes for tree_left and tree_right:

p2.tree_left <= p1.tree_left AND p2.tree_right >= p1.tree_left