SQLite – Count Referenced Rows Using Self-Referencing Table

self-joinsqlite

media_subscriptions
+----+--------------------------+
| id | target_metadata_item_id  |
+----+--------------------------+
| 30 | 19039                    |
| 31 | NULL                     |
| 32 | NULL                     |
| 33 | NULL                     |
| 34 | NULL                     |
| 35 | NULL                     |
| 36 | NULL                     |
| 37 | 18976                    |
+----+--------------------------+

metadata_items
+-------+-----------------+-----------+
| id    | guid            | parent_id |
+-------+-----------------+-----------+
| 18976 | id://265        | NULL      |
| 18977 | id://265/1      | 18976     |
| 19039 | id://841        | NULL      |
| 19040 | id://841/1      | 19039     |
| 19041 | id://841/1/1    | 19040     |
| 19224 | id://841/1/2    | 19040     |
| 19966 | id://265/1/1    | 18977     |
| 19967 | id://290        | NULL      |
| 19968 | id://376        | NULL      |
+-------+-----------------+-----------+

I have the above two tables with data. The parent_id column defines how each row in metadata_items relates to any other row, which is also evidential by each row's guid. The parent_id can be nested by a root ID having a child, and that child having a child (if that makes sense). The target_metadata_item_id column from media_subscriptions relates to the id column of the metadata_items table. I want to get a COUNT(*) of any records from the metadata_items table that have an associated record in media_subscriptions based on a search of guid. Ultimately, if there is a matching record, the count should only ever be 1, and 0 if not, but I'm using COUNT(*) just in case there's ever a duplicate record in the media_subscriptions table, and then I'm just checking elsewhere if it returns > 0.

Some examples of what I'm trying to accomplish:

  • guid='id://841' would return 1 because its root parent id (19039) is referenced by id 30 in media_subscriptions.
  • guid='id://841/1/1' would return 1 because its root parent id (19039) is referenced by id 30 in media_subscriptions.
  • guid='id://290' would return 0 because its root parent id (19967) is NOT referenced by ANY column in media_subscriptions.

Here's what I've got right now, which obviously isn't working as I'd intend (take it easy on me, I'm a noob with database administration, and am just starting to learn).

SELECT COUNT(*) FROM `metadata_items` AS `t1`
 LEFT JOIN `metadata_items` AS `t2` ON t1.id = t2.parent_id
 INNER JOIN `media_subscriptions` AS `subscriptions` ON t1.id = subscriptions.target_metadata_item_id
 WHERE t1.guid='id://841'

This correctly returns 1, but if I switch out the last line with WHERE t1.guid='id://265/1/1' it returns 0 where I'd like it to return 1. Obviously I'm doing this very incorrectly, but I'm really not sure how to go about getting where I need to be.

Best Answer

Test

WITH RECURSIVE cte AS
(
SELECT id, guid, parent_id, id _id_, guid _guid_, parent_id _parent_id_
FROM metadata_items
UNION ALL
SELECT cte.id, cte.guid, cte.parent_id, src.id, src.guid, src.parent_id
FROM cte
JOIN metadata_items src ON cte._parent_id_ = src.id
)
SELECT cte.id, cte.guid, cte.parent_id, MAX(target_metadata_item_id) IS NOT NULL 
FROM cte 
LEFT JOIN media_subscriptions src ON cte._id_ = src.target_metadata_item_id
GROUP BY cte.id, cte.guid, cte.parent_id
ORDER BY cte.id;

fiddle

If you need the data for one separate record only then add proper WHERE.