Sql-server – Dissecting Joins

sql servert-sql

Recently I asked a question in dba.stackexchange the question was about data hierarchy in sql and a guy named chris allen answered the question and I really loved it but when I try to understand that query much more I got confused.

Here's the query:

Res_id  Res_name         Man_id
1           sam             3
2           Biju            4
3           adrian          Null
4           Helen           3
5           Micah           4


select  
 level1.res_name as level1,
 level2.res_name as level2,
 level3.res_name as level3,
 level4.res_name as level4,
 level5.res_name as level5

from resource as level1
left join resource as level2 on level1.res_id=level2.man_id
left join resource as level3 on level2.res_id=level3.man_id
left join resource as level4 on level3.res_id=level4.man_id
left join resource as level5 on level4.res_id=level5.man_id

where isnull(level1.man_id,0)=0

I did several approaches to crack and was in vain. I think I am missing something well my findings are given below

  1. First he named four columns as each level 1,2,3 etc
  2. Each time he left joined the same table 4 time each with different aliases
  3. By giving condition on each join the table is getting changed to next level
  4. Where condition to specify the manager

The part I failed to understand is the join he made even though I know the use of left join where we will get the columns of left table and on the right if no match is found it will be NULL.So I am looking for some visualization of the join happening here or some understanding of the joins in depth of this particular scenario.If the question doesn't fit here I will delete it . I also asked the same in the question I posted but there was no reply that is why I am putting a new question

Best Answer

Try this query without the joins first:

select  
 level1.res_name as level1

from resource as level1

where isnull(level1.man_id,0)=0

This returns the rows where man_id isn't referencing anything. In other words, this returns the topmost-level managers only. For your particular example it works like this:

  • the FROM clause returns the entire table:

    level1.Res_id  level1.Res_name  level1.Man_id
    -------------  ---------------  -------------
    1              sam              3
    2              Biju             4
    3              adrian           NULL
    4              Helen            3
    5              Micah            4
    
  • the WHERE clause leaves just one row:

    level1.Res_id  level1.Res_name  level1.Man_id
    -------------  ---------------  -------------
    3              adrian           NULL
    
  • finally, the SELECT takes from it only one column:

    level1
    ------
    adrian
    

Adding one self-join like in Chris's query,

select  
 level1.res_name as level1,
 level2.res_name as level2

from resource as level1
left join resource as level2 on level1.res_id=level2.man_id

where isnull(level1.man_id,0)=0

gives you every top-level manager's direct subordinates. More specifically, the join results in the following row set:

level1.Res_id  level1.Res_name  level1.Man_id  level2.Res_id  level2.Res_name  level2.Man_id
-------------  ---------------  -------------  -------------  ---------------  -------------
1              sam              3              NULL           NULL             NULL
2              Biju             4              NULL           NULL             NULL
3              adrian           NULL           1              sam              3
3              adrian           NULL           4              Helen            3
4              Helen            3              2              Biju             4
4              Helen            3              2              Biju             4
5              Micah            4              NULL           NULL             NULL

The WHERE clause filters it down to just this:

level1.Res_id  level1.Res_name  level1.Man_id  level2.Res_id  level2.Res_name  level2.Man_id
-------------  ---------------  -------------  -------------  ---------------  -------------
3              adrian           NULL           1              sam              3
3              adrian           NULL           4              Helen            3

And finally the SELECT clause returns only the names:

level1  level2
------  ------
adrian  sam
adrian  Helen

In the same manner, adding one more join, like this:

select  
 level1.res_name as level1,
 level2.res_name as level2,
 level3.res_name as level3

from resource as level1
left join resource as level2 on level1.res_id=level2.man_id
left join resource as level3 on level2.res_id=level3.man_id

where isnull(level1.man_id,0)=0

brings about the next level of subordinates:

  • this is the result of joining the third instance of the table to the results of the first join:

    1.Res_id  1.Res_name  1.Man_id  2.Res_id  2.Res_name  2.Man_id  3.Res_id  3.Res_name  3.Man_id
    --------  ----------  --------  --------  ----------  --------  --------  ----------  --------
    1         sam         3         NULL      NULL        NULL      NULL      NULL        NULL
    2         Biju        4         NULL      NULL        NULL      NULL      NULL        NULL
    3         adrian      NULL      1         sam         3         NULL      NULL        NULL
    3         adrian      NULL      4         Helen       3         2         Biju        4
    3         adrian      NULL      4         Helen       3         5         Micah       4
    4         Helen       3         2         Biju        4         NULL      NULL        NULL
    4         Helen       3         5         Micah       4         NULL      NULL        NULL
    5         Micah       4         NULL      NULL        NULL      NULL      NULL        NULL
    

    (I've shortened the table aliases for convenience: 1 stands for level1, 2 for level2, and 3 for level3)

  • this is what remains after the WHERE filter:

    1.Res_id  1.Res_name  1.Man_id  2.Res_id  2.Res_name  2.Man_id  3.Res_id  3.Res_name  3.Man_id
    --------  ----------  --------  --------  ----------  --------  --------  ----------  --------
    3         adrian      NULL      1         sam         3         NULL      NULL        NULL
    3         adrian      NULL      4         Helen       3         2         Biju        4
    3         adrian      NULL      4         Helen       3         5         Micah       4
    
  • and this is what SELECT extracts from the above and gives you back:

    level1  level2  level3
    ------  ------  ------
    adrian  sam     NULL
    adrian  Helen   Biju
    adrian  Helen   Micah
    

Same continues for the rest of the joins. With your example, however, there will be no more rows, as the hierarchy in the table doesn't go deeper than two levels. Consequently, the other two columns in Chris's original query, level4 and level5, will return NULLs:

level1  level2  level3  level4  level5
------  ------  ------  ------  ------
adrian  sam     NULL    NULL    NULL
adrian  Helen   Biju    NULL    NULL
adrian  Helen   Micah   NULL    NULL

If Chris's assumption that four joins should be enough for most cases does not work for your particular case (and I don't mean the simple example in your question, of course), you can throw in more self-joins using the same pattern.