MySQL – How to Write Queries for Hierarchical Data

hierarchyjoin;MySQL

This table is a questions and answers CMS and i want to know how data is fetched from these tables.

+------+----------+-------+
|  id  | parentid | type  |
+------+----------+-------+
|  1   |     0    |   Q   |
+------+----------+-------+
|  2   |     1    |   A   |
+------+----------+-------+
|  3   |     0    |   Q   |
+------+----------+-------+
|  4   |     2    |   C   |
+------+----------+-------+
|  5   |     2    |   C   |
+------+----------+-------+
|  6   |     1    |   A   |
+------+----------+-------+
|  7   |     3    |   C   |
+------+----------+-------+
|  8   |     1    |   C   |
+------+----------+-------+
|  9   |     0    |   Q   |
+------+----------+-------+
|  10  |     9    |   C   |
+------+----------+-------+
|  11  |     9    |   A   |
+------+----------+-------+
|  12  |    11    |   C   |
+------+----------+-------+

For example, the results like below.

+------+----------+-------+
|  id  | parentid | type  |
+------+----------+-------+
|  1   |     0    |   Q   |
+------+----------+-------+
|  8   |     1    |   C   |
+------+----------+-------+
|  2   |     1    |   A   |
+------+----------+-------+
|  4   |     2    |   C   |
+------+----------+-------+
|  5   |     2    |   C   |
+------+----------+-------+
|  6   |     1    |   A   |
+------+----------+-------+

This query are not results sorted as above.and in many tutorials advised that the JOIN is better than it used to be for this table.

SELECT * FROM qa_posts WHERE postid=1
UNION
SELECT * FROM qa_posts WHERE parentid=1
UNION
SELECT * FROM qa_posts WHERE parentid IN (SELECT postid parentid FROM qa_posts WHERE parentid=1)
ORDER BY parentid;

Best Answer

Your structure i commonly known as an Adjacency List (should perhaps be Set) Model. If you don't want to change the structure of your model, you basically have to do something similar to what you are currently doing. You can rewrite your query using joins like:

SELECT q1.* FROM qa_posts q1
WHERE q1.postid=1
UNION
SELECT q2.* FROM qa_posts q1
            JOIN qa_posts q2
                on q1.postid=q2.parentid
WHERE q1.postid=1
UNION
SELECT q3.* FROM qa_posts q1
            JOIN qa_posts q2
                on q1.postid=q2.parentid
            JOIN qa_posts q3
                on q2.postid=q3.parentid
WHERE q1.postid=1;

If you are willing to change the model you can search for Nested Set (example http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/), Materialized Path (example http://bojanz.wordpress.com/2014/04/25/storing-hierarchical-data-materialized-path/) and Transitive Closure Table (at the end of http://www.slideshare.net/billkarwin/models-for-hierarchical-data, at the beginning the other two models are described)

Edit: Adding DFS with different weights according to type:

select postid, parentid, type 
from (
    SELECT q1.*     
        , lpad(q1.parentid,8,case q1.type when 'Q' then 'a' 
                                          when 'C' then 'b' 
                                          when 'A' then 'c' 
                             end)         
        || lpad(q1.postid,8,case q1.type when 'Q' then 'a' 
                                         when 'C' then 'b' 
                                         when 'A' then 'c' 
                             end) as padded_path 
    FROM qa_posts q1 
    WHERE q1.postid=1 
    UNION 
    SELECT q2.*     
       , lpad(q1.parentid,8,case q1.type when 'Q' then 'a' 
                                         when 'C' then 'b' 
                                         when 'A' then 'c' 
                            end)         
        || lpad(q1.postid,8,case q1.type when 'Q' then 'a' 
                                         when 'C' then 'b' 
                                         when 'A' then 'c' 
                            end)         
        || lpad(q2.postid,8,case q2.type when 'Q' then 'a' 
                                         when 'C' then 'b' 
                                         when 'A' then 'c' 
                            end) as padded_path 
    FROM qa_posts q1 
    JOIN qa_posts q2     
        on q1.postid=q2.parentid 
    WHERE q1.postid=1 
    UNION 
    SELECT q3.*     
         , lpad(q1.parentid,8,case q1.type when 'Q' then 'a' 
                                           when 'C' then 'b' 
                                           when 'A' then 'c' 
                              end)         
         || lpad(q1.postid,8,case q1.type when 'Q' then 'a' 
                                          when 'C' then 'b' 
                                          when 'A' then 'c' 
                              end)         
         || lpad(q2.postid,8,case q2.type when 'Q' then 'a' 
                                          when 'C' then 'b' 
                                          when 'A' then 'c' 
                              end)         
         || lpad(q3.postid,8,case q3.type when 'Q' then 'a' 
                                          when 'C' then 'b' 
                                          when 'A' then 'c' 
                              end) as padded_path 
    FROM qa_posts q1 
    JOIN qa_posts q2     
        on q1.postid=q2.parentid 
    JOIN qa_posts q3     
        on q2.postid=q3.parentid 
    WHERE q1.postid=1
) as x 
order by padded_path;

+--------+----------+------+
| postid | parentid | type |
+--------+----------+------+
|      1 |        0 | Q    |
|      8 |        1 | C    |
|      2 |        1 | A    |
|      4 |        2 | C    |
|      5 |        2 | C    |
|      6 |        1 | A    |
+--------+----------+------+
6 rows in set (0.00 sec)

Edit: The other Questions result in (padded_path included):

+--------+----------+------+--------------------------+
| postid | parentid | type | padded_path              |
+--------+----------+------+--------------------------+
|      3 |        0 | Q    | aaaaaaa0aaaaaaa3         |
|      7 |        3 | C    | aaaaaaa0aaaaaaa3bbbbbbb7 |
+--------+----------+------+--------------------------+
2 rows in set (0.00 sec)


+--------+----------+------+----------------------------------+
| postid | parentid | type | padded_path                      |
+--------+----------+------+----------------------------------+
|      9 |        0 | Q    | aaaaaaa0aaaaaaa9                 |
|     10 |        9 | C    | aaaaaaa0aaaaaaa9bbbbbb10         |
|     11 |        9 | A    | aaaaaaa0aaaaaaa9cccccc11         |
|     12 |       11 | C    | aaaaaaa0aaaaaaa9cccccc11bbbbbb12 |
+--------+----------+------+----------------------------------+
4 rows in set (0.00 sec)

Changing the predicate q1.postid=1 to q1.type='Q' results in:

+--------+----------+------+----------------------------------+
| postid | parentid | type | padded_path                      |
+--------+----------+------+----------------------------------+
|      1 |        0 | Q    | aaaaaaa0aaaaaaa1                 |
|      8 |        1 | C    | aaaaaaa0aaaaaaa1bbbbbbb8         |
|      2 |        1 | A    | aaaaaaa0aaaaaaa1ccccccc2         |
|      4 |        2 | C    | aaaaaaa0aaaaaaa1ccccccc2bbbbbbb4 |
|      5 |        2 | C    | aaaaaaa0aaaaaaa1ccccccc2bbbbbbb5 |
|      6 |        1 | A    | aaaaaaa0aaaaaaa1ccccccc6         |
|      3 |        0 | Q    | aaaaaaa0aaaaaaa3                 |
|      7 |        3 | C    | aaaaaaa0aaaaaaa3bbbbbbb7         |
|      9 |        0 | Q    | aaaaaaa0aaaaaaa9                 |
|     10 |        9 | C    | aaaaaaa0aaaaaaa9bbbbbb10         |
|     11 |        9 | A    | aaaaaaa0aaaaaaa9cccccc11         |
|     12 |       11 | C    | aaaaaaa0aaaaaaa9cccccc11bbbbbb12 |
+--------+----------+------+----------------------------------+
12 rows in set (0.00 sec)