Mysql – Loop through Self JOIN on table until the operand column is NULL completely

MySQLtree

For the problem I have a single table which store a Member's ID and His Referee's ID. This is something like a Multi-Level-Marketing plan where you join a company for some tasks and then someone join from your reference and appear under your downline and so on and so forth. The scheme I am using is not necessarily a binary tree, there can be many users under a single user.

I want to retrieve the entire tree under a specific user, I am using below query to retrieve the records but this is not the solution I want.

SELECT t1.user AS lvl1, t2.user AS lvl2, t3.user AS lvl3,
       t4.user AS lvl4, t5.user AS lvl5
FROM reference t1  
LEFT JOIN reference t2 ON t2.referee = t1.user  
LEFT JOIN reference t3 ON t3.referee = t2.user  
LEFT JOIN reference t4 ON t4.referee = t3.user  
LEFT JOIN reference t5 ON t5.referee = t4.user  
WHERE t1.referee = 1

This query will get me the tree structure under user 1 but only 5 level deep.

Now I want to retrieve the entire tree upto last level but cant figure out a way for same. Tree depth can be anything depending on the users referrals so I cannot even guess the depth (That wont make sense anyway).

All I want right now is to perform the JOIN and keep watching the selected column (in our case user) from last JOIN to check if all the values in it are NULL. If the column from last JOIN is NULL then it means the tree is completed and query should return the result.

Can anyone suggest a workaround?
I can make changes in the table structure if necessary but that should not effect the behavior of other operations.

Best Answer

Although MySQL has no CTE functionality, there are two major ways to create CTE expressions:

TECHNIQUE #1 : Write Stored Procedures to Traverse Recursively

Rather than Reinventing the Wheel, please see my past posts on how to make stored procedures

TECHNIQUE #2 : Iterate Based on the Tree's Maximum Depth

First, here is some sample data

DROP DATABASE IF EXISTS dufran;
CREATE DATABASE dufran;
USE dufran;
CREATE TABLE reference 
(
    user INT NOT NULL AUTO_INCREMENT,
    referee INT NOT NULL DEFAULT 0,
    name VARCHAR(20) NOT NULL,
    PRIMARY KEY (user)
);
INSERT INTO reference (name,referee) VALUES ('Rolando',0);
INSERT INTO reference (name,referee) VALUES ('Pamela',1);
INSERT INTO reference (name,referee) VALUES ('Carlik',2);
INSERT INTO reference (name,referee) VALUES ('Javonne',2);
INSERT INTO reference (name,referee) VALUES ('Dominique',2);
INSERT INTO reference (name,referee) VALUES ('Diamond',2);
INSERT INTO reference (name,referee) VALUES ('Azalia',3);

When loaded, this is the data

mysql> SELECT * FROM reference;
+------+---------+-----------+
| user | referee | name      |
+------+---------+-----------+
|    1 |       0 | Rolando   |
|    2 |       1 | Pamela    |
|    3 |       2 | Carlik    |
|    4 |       2 | Javonne   |
|    5 |       2 | Dominique |
|    6 |       2 | Diamond   |
|    7 |       3 | Azalia    |
+------+---------+-----------+
7 rows in set (0.00 sec)

mysql>

Here is how you compute the maximum depth

SET @depth = 0;
SELECT MAX(@depth:=@depth+1) max_depth FROM
(SELECT DISTINCT referee FROM reference) A;

Here is the maximum depth for the sample data

mysql> SET @depth = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT MAX(@depth:=@depth+1) max_depth FROM
    -> (SELECT DISTINCT referee FROM reference) A;
+-----------+
| max_depth |
+-----------+
|         4 |
+-----------+
1 row in set (0.00 sec)

mysql>

Using this principle, iterate for the SELECT columns and the LEFT JOIN clauses

SET @depth = 0;
SELECT GROUP_CONCAT(CONCAT('t',dep,'.user AS lvl',dep))
INTO @sqllist FROM
(SELECT @depth:=@depth+1 dep FROM
(SELECT DISTINCT referee FROM reference) AA) A;
SET @depth1 = 0;
SET @depth2 = 1;
SELECT GROUP_CONCAT(sqljoin SEPARATOR ' ') INTO @sqljoins
FROM (SELECT CONCAT('LEFT JOIN reference t',@depth2:=@depth2+1,
' ON t',@depth2,'.referee = t',@depth1:=@depth1+1,'.user') sqljoin
FROM (SELECT DISTINCT referee FROM reference) AA) A;
SELECT @sqljoins;
SET @sqlstmt = CONCAT('SELECT ',@sqllist,
' FROM reference t1 ',@sqljoins);
SELECT @sqllist\G
SELECT @sqljoins\G
SELECT @sqlstmt\G

Here is the SQL generated

mysql> SET @depth = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT GROUP_CONCAT(CONCAT('t',dep,'.user AS lvl',dep))
    -> INTO @sqllist FROM
    -> (SELECT @depth:=@depth+1 dep FROM
    -> (SELECT DISTINCT referee FROM reference) AA) A;
Query OK, 1 row affected (0.00 sec)

mysql> SET @depth1 = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @depth2 = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT GROUP_CONCAT(sqljoin SEPARATOR ' ') INTO @sqljoins
    -> FROM (SELECT CONCAT('LEFT JOIN reference t',@depth2:=@depth2+1,
    -> ' ON t',@depth2,'.referee = t',@depth1:=@depth1+1,'.user') sqljoin
    -> FROM (SELECT DISTINCT referee FROM reference) AA) A;
Query OK, 1 row affected (0.00 sec)

mysql>     SELECT @sqllist\G
*************************** 1. row ***************************
@sqllist: t1.user AS lvl1,t2.user AS lvl2,t3.user AS lvl3,t4.user AS lvl4
1 row in set (0.00 sec)

mysql>     SELECT @sqljoins\G
*************************** 1. row ***************************
@sqljoins: LEFT JOIN reference t2 ON t2.referee = t1.user LEFT JOIN reference t3 ON t3.referee = t2.user LEFT JOIN reference t4 ON t4.referee = t3.user LEFT JOIN reference t5 ON t5.referee = t4.user
1 row in set (0.00 sec)

mysql>     SELECT @sqlstmt\G
*************************** 1. row ***************************
@sqlstmt: SELECT t1.user AS lvl1,t2.user AS lvl2,t3.user AS lvl3,t4.user AS lvl4 FROM reference t1 LEFT JOIN reference t2 ON t2.referee = t1.user LEFT JOIN reference t3 ON t3.referee = t2.user LEFT JOIN reference t4 ON t4.referee = t3.user LEFT JOIN reference t5 ON t5.referee = t4.user
1 row in set (0.00 sec)

mysql>

Here is the big question, does the SQL work ??? Take the SQL and run it dynamically:

PREPARE s FROM @sqlstmt;
EXECUTE s;
DEALLOCATE PREPARE s;

Here is the result:

mysql> PREPARE s FROM @sqlstmt;
Query OK, 0 rows affected (0.00 sec)
Statement prepared

mysql> EXECUTE s;
+------+------+------+------+
| lvl1 | lvl2 | lvl3 | lvl4 |
+------+------+------+------+
|    1 |    2 |    3 |    7 |
|    1 |    2 |    4 | NULL |
|    1 |    2 |    5 | NULL |
|    1 |    2 |    6 | NULL |
|    2 |    3 |    7 | NULL |
|    2 |    4 | NULL | NULL |
|    2 |    5 | NULL | NULL |
|    2 |    6 | NULL | NULL |
|    3 |    7 | NULL | NULL |
|    4 | NULL | NULL | NULL |
|    5 | NULL | NULL | NULL |
|    6 | NULL | NULL | NULL |
|    7 | NULL | NULL | NULL |
+------+------+------+------+
13 rows in set (0.03 sec)

mysql> DEALLOCATE PREPARE s;
Query OK, 0 rows affected (0.00 sec)

mysql>

I scripted this to give you a starting point and some homework. Here is what I mean: You probably don't want to see NULL columns. You could do one of two things:

  1. Convert each NULL to IFNULL(lvl1,''), GROUP_CONCAT all columns.
  2. Compute the depth of a particular id and gather columns for it only.

Give it a Try !!!

UPDATE 2016-06-14 16:36 EDT

Someone just commented

Your max depth calculation is wrong. Proof: Add to your sample data: INSERT INTO reference (name,referee) VALUES ('Maria',4);

Let's try it out

mysql> INSERT INTO reference (name,referee) VALUES ('Maria',4);
Query OK, 1 row affected (0.00 sec)

mysql> SET @depth = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT MAX(@depth:=@depth+1) max_depth FROM
    -> (SELECT DISTINCT referee FROM reference) A;
+-----------+
| max_depth |
+-----------+
|         5 |
+-----------+
1 row in set (0.00 sec)

OK, it says the depth is 5. Is there visible proof of this ?

mysql> SET @depth = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT GROUP_CONCAT(CONCAT('t',dep,'.user AS lvl',dep))
    -> INTO @sqllist FROM
    -> (SELECT @depth:=@depth+1 dep FROM
    -> (SELECT DISTINCT referee FROM reference) AA) A;
Query OK, 1 row affected (0.00 sec)

mysql> SET @depth1 = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @depth2 = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT GROUP_CONCAT(sqljoin SEPARATOR ' ') INTO @sqljoins
    -> FROM (SELECT CONCAT('LEFT JOIN reference t',@depth2:=@depth2+1,
    -> ' ON t',@depth2,'.referee = t',@depth1:=@depth1+1,'.user') sqljoin
    -> FROM (SELECT DISTINCT referee FROM reference) AA) A;
Query OK, 1 row affected (0.00 sec)

mysql> SELECT @sqljoins;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| @sqljoins                                                                                                                                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LEFT JOIN reference t2 ON t2.referee = t1.user LEFT JOIN reference t3 ON t3.referee = t2.user LEFT JOIN reference t4 ON t4.referee = t3.user LEFT JOIN reference t5 ON t5.referee = t4.user LEFT JOIN reference t6 ON t6.referee = t5.user |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SET @sqlstmt = CONCAT('SELECT ',@sqllist,
    -> ' FROM reference t1 ',@sqljoins);
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @sqllist\G
*************************** 1. row ***************************
@sqllist: t1.user AS lvl1,t2.user AS lvl2,t3.user AS lvl3,t4.user AS lvl4,t5.user AS lvl5
1 row in set (0.00 sec)

mysql> SELECT @sqljoins\G
*************************** 1. row ***************************
@sqljoins: LEFT JOIN reference t2 ON t2.referee = t1.user LEFT JOIN reference t3 ON t3.referee = t2.user LEFT JOIN reference t4 ON t4.referee = t3.user LEFT JOIN reference t5 ON t5.referee = t4.user LEFT JOIN reference t6 ON t6.referee = t5.user
1 row in set (0.00 sec)

mysql> SELECT @sqlstmt\G
*************************** 1. row ***************************
@sqlstmt: SELECT t1.user AS lvl1,t2.user AS lvl2,t3.user AS lvl3,t4.user AS lvl4,t5.user AS lvl5 FROM reference t1 LEFT JOIN reference t2 ON t2.referee = t1.user LEFT JOIN reference t3 ON t3.referee = t2.user LEFT JOIN reference t4 ON t4.referee = t3.user LEFT JOIN reference t5 ON t5.referee = t4.user LEFT JOIN reference t6 ON t6.referee = t5.user
1 row in set (0.00 sec)

mysql> PREPARE s FROM @sqlstmt;
Query OK, 0 rows affected (0.00 sec)
Statement prepared

Guess what the output says ?

mysql> EXECUTE s;
+------+------+------+------+------+
| lvl1 | lvl2 | lvl3 | lvl4 | lvl5 |
+------+------+------+------+------+
|    1 |    2 |    3 |    7 | NULL |
|    1 |    2 |    4 |    8 | NULL |
|    1 |    2 |    5 | NULL | NULL |
|    1 |    2 |    6 | NULL | NULL |
|    2 |    3 |    7 | NULL | NULL |
|    2 |    4 |    8 | NULL | NULL |
|    2 |    5 | NULL | NULL | NULL |
|    2 |    6 | NULL | NULL | NULL |
|    3 |    7 | NULL | NULL | NULL |
|    4 |    8 | NULL | NULL | NULL |
|    5 | NULL | NULL | NULL | NULL |
|    6 | NULL | NULL | NULL | NULL |
|    7 | NULL | NULL | NULL | NULL |
|    8 | NULL | NULL | NULL | NULL |
+------+------+------+------+------+
14 rows in set (0.00 sec)

mysql> DEALLOCATE PREPARE s;
Query OK, 0 rows affected (0.00 sec)

mysql>

Just as my code predicted. The depth is 5 and the Dynamic SQL just proved it.