What you are looking is simply not facilitated by MySQL as of writing this answer. Common Table Expressions (CTEs) are supported by several databases. @a_horse_with_no_name posted an answer naming those RDBMSs.
I have posted recursive Stored Procedures: Find highest level of a hierarchical field: with vs without CTEs. However, you emphasized not using any Stored Procedures. To do what you are asking does require Stored Procedures in MySQL.
Compromise Solution...
You will still need a Stored Procedure, but not using recursion. Using a simple loop, you can construct a self-join query and executed it dynamically. Given the depth of the tree you are looking for, here is such a Stored Procedure:
DELIMITER $$
DROP PROCEDURE IF EXISTS `dianuj`.`GetTopParentGivenDepth` $$
CREATE PROCEDURE `dianuj`.`GetTopParentGivenDepth` (GivenDepth INT)
BEGIN
DECLARE x1,x2 INT;
SET x = 0;
SET y = 1;
SET @SQ = 'SELECT DISTINCT A0.id FROM prarent A0';
WHILE y < GivenDepth DO
SET @SQ = CONCAT(@SQ,' INNER JOIN prarent A',y,' ON A',x,'.id = A',y,'.parent_id');
SET x = y;
SET y = x + 1;
END WHILE;
SET @SQ = CONCAT(@SQ,' WHERE A0.parent_id = 0');
SELECT @SQ;
PREPARE stmt FROM @SQ;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END $$
DELIMITER ;
After loading your data into MySQL on my desktop, I ran the Stored Procedure with...
Depth 1
mysql> call GetTopParentGivenDepth(1);
+--------------------------------------------------------------+
| @SQLSTMT |
+--------------------------------------------------------------+
| SELECT DISTINCT A0.id FROM prarent A0 WHERE A0.parent_id = 0 |
+--------------------------------------------------------------+
1 row in set (0.00 sec)
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql>
Depth 2
mysql> call GetTopParentGivenDepth(2);
+------------------------------------------------------------------------------------------------------------+
| @SQLSTMT |
+------------------------------------------------------------------------------------------------------------+
| SELECT DISTINCT A0.id FROM prarent A0 INNER JOIN prarent A1 ON A0.id = A1.parent_id WHERE A0.parent_id = 0 |
+------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
+----+
| id |
+----+
| 3 |
| 2 |
+----+
2 rows in set (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
mysql>
Depth 3 .. 6
mysql> call GetTopParentGivenDepth(3);
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| @SQLSTMT |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT DISTINCT A0.id FROM prarent A0 INNER JOIN prarent A1 ON A0.id = A1.parent_id INNER JOIN prarent A2 ON A1.id = A2.parent_id WHERE A0.parent_id = 0 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
+----+
| id |
+----+
| 3 |
+----+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> call GetTopParentGivenDepth(4);
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| @SQLSTMT |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT DISTINCT A0.id FROM prarent A0 INNER JOIN prarent A1 ON A0.id = A1.parent_id INNER JOIN prarent A2 ON A1.id = A2.parent_id INNER JOIN prarent A3 ON A2.id = A3.parent_id WHERE A0.parent_id = 0 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
+----+
| id |
+----+
| 3 |
+----+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> call GetTopParentGivenDepth(5);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| @SQLSTMT |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT DISTINCT A0.id FROM prarent A0 INNER JOIN prarent A1 ON A0.id = A1.parent_id INNER JOIN prarent A2 ON A1.id = A2.parent_id INNER JOIN prarent A3 ON A2.id = A3.parent_id INNER JOIN prarent A4 ON A3.id = A4.parent_id WHERE A0.parent_id = 0 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
+----+
| id |
+----+
| 3 |
+----+
1 row in set (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> call GetTopParentGivenDepth(6);
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| @SQLSTMT |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT DISTINCT A0.id FROM prarent A0 INNER JOIN prarent A1 ON A0.id = A1.parent_id INNER JOIN prarent A2 ON A1.id = A2.parent_id INNER JOIN prarent A3 ON A2.id = A3.parent_id INNER JOIN prarent A4 ON A3.id = A4.parent_id INNER JOIN prarent A5 ON A4.id = A5.parent_id WHERE A0.parent_id = 0 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Empty set (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
mysql>
Notice that Depth 6
has no return value. To find the maximum depth, you would have to iterate on it until no rows comes back.
CAVEAT
- The worst case scenario would be to use the number of rows in the table as the depth. Such a tree would be nothing more than a degenerate linked list.
- I cannot give you any guarantees on query performance, such on SQL construction and subsequent execution. Any kind of recursion would be the sole responsibility of the Query Parser.
Give it a Try !!!
This does the job:
CREATE MATERIALIZED VIEW speedy_materialized_view AS
WITH RECURSIVE tree AS (
SELECT id, parent_id, ARRAY[id] AS path
FROM gear_category
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, path || c.id
FROM tree t
JOIN gear_category c ON c.parent_id = t.id
)
, tree_ct AS (
SELECT t.id, t.path, COALESCE(i.item_ct, 0) AS item_ct
FROM tree t
LEFT JOIN (
SELECT category_id AS id, count(*) AS item_ct
FROM gear_item
GROUP BY 1
) i USING (id)
)
SELECT t.id
, t.item_ct AS count_direct_child_items
, sum(t1.item_ct) AS count_recursive_child_items
FROM tree_ct t
LEFT JOIN tree_ct t1 ON t1.path[1:array_upper(t.path, 1)] = t.path
GROUP BY t.id, t.item_ct;
count_recursive_child_items
is counted separately for each category, so I am not convinced it's the fastest possible way for deep trees.
However, aggregate functions are not allowed in recursive CTEs.
Recursive Function
Strictly speaking, it's really iterative - but so is a "recursive" CTE.
You could build a function working with temp tables. You need to know your way around plpgsql or there is too much to explain.
CREATE OR REPLACE FUNCTION f_tree_ct()
RETURNS TABLE (id int, count_direct_child_items int, count_recursive_child_items int) AS
$func$
DECLARE
_lvl int;
BEGIN
-- basic table with added path and count
CREATE TEMP TABLE t1 AS
WITH RECURSIVE tree AS (
SELECT c.id, c.parent_id, '{}'::int[] AS path, 0 AS lvl
FROM gear_category c
WHERE c.parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, path || c.parent_id, lvl + 1
FROM tree t
JOIN gear_category c ON c.parent_id = t.id
)
, tree_ct AS (
SELECT t.id, t.parent_id, t.path, t.lvl, COALESCE(i.item_ct, 0) AS item_ct
FROM tree t
LEFT JOIN (
SELECT i.category_id AS id, count(*)::int AS item_ct
FROM gear_item i
GROUP BY 1
) i USING (id)
)
TABLE tree_ct;
-- CREATE INDEX ON t1 (lvl); -- only for very deep trees
SELECT INTO _lvl max(lvl) FROM t1; -- identify max lvl to start bottom up
-- recursively aggregate each level in 2nd temp table
CREATE TEMP TABLE t2 AS
SELECT t1.id, t1.parent_id, t1.lvl
, t1.item_ct
, t1.item_ct AS sum_ct
FROM t1
WHERE t1.lvl = _lvl;
IF _lvl > 0 THEN
FOR i IN REVERSE _lvl .. 1 LOOP
INSERT INTO t2
SELECT t1.id, t1.parent_id, t1.lvl, t1.item_ct
, CASE WHEN t2.sum_ct IS NULL THEN t1.item_ct ELSE t1.item_ct + t2.sum_ct END
FROM t1
LEFT JOIN (
SELECT t2.parent_id AS id, sum(t2.sum_ct) AS sum_ct
FROM t2
WHERE t2.lvl = i
GROUP BY 1
) t2 USING (id)
WHERE t1.lvl = i - 1;
END LOOP;
END IF;
RETURN QUERY -- only requested columns, unsorted
SELECT t2.id, t2.item_ct, t2.sum_ct FROM t2;
DROP TABLE t1, t2; -- to allow repeated execution in one transaction
RETURN;
END
$func$ LANGUAGE plpgsql;
This cannot be included in a CREATE MATERIALIZED VIEW
statement because of the use of temporary tables. You could just create another (temp) table with it, acting as a manually maintained "materialized view":
CREATE TABLE speedy_materialized_view AS
SELECT * FROM f_tree_ct();
Alternatively you could TRUNCATE speedy_materialized_view
in the function and write to it directly. The function would RETURNS void
instead or you could return some meta-information like a row count ...
SQL Fiddle.
Aside:
Column aliases in the recursive term of a CTE are just for documentation since output column names are determined by the non-recursive term only.
Best Answer
Celko's book is a good resource - if a bit overly "academic" at times.
I also have really found this method, known as 'closure tables' to work fairly well.
If you're using a database that allows recursive CTEs (such as PostgreSQL 8.4 or newer, or SQL Server 2005 or newer), they're really the best way to go. If you're on Oracle, there's always the venerable "connect by".
It is my experience that it's far more common to be handed a set of tables in a "naive tree" schema, and have to figure out how to extract the correct tree from that storage, than it is to have an opportunity to create the cleaner "closure tables" structure.