A few details are still not clear, but let's see, what we can do now.
First, having about 100M rows and 231 partitions sounds not that good. The resulting tables will be too small, in turn their number too high - I cannot tell the threshold, but at some point the query planning migt get too expensive. I think it is quite possible that yearly partitions would be enough. Alternatively, if you really want to fetch a whole month at a time, create monthly partitions.
Now to the actual problem.
It is not quite clear to me why you have rows in the parent table. The usual way of partitioning is that the parent is empty, and every row is redirected to one of the children.
At the same time, is you have an index on posted_at
of the parent table (as you have it on the children), finding rows in the parent based on the timestamp is easy.
On the other hand, while I'm not sure which column shared_parent_id
refers to, you can define an index on it, too - looking rows up based on this will be easy, too.
The only thing still has to be added is tell your query to look for parents in the parent table only. Let's have a look at a possible query:
WITH child_messages AS (
SELECT shared_parent_id, {other interesting columns}
FROM messages
WHERE posted_at {matches your needs}
)
SELECT *
FROM child_messages
UNION ALL
SELECT shared_parent_id, {other interesting columns}
FROM ONLY messages -- this way it does not go to the children
WHERE {unclear column} IN (SELECT shared_parent_id FROM child_messages);
The WITH
query may pick up rows from the parent, too - this you may or may not want, adjust the query accordingly.
Furthermore, the performance might not be ideal, in this case there is room for tweaking the query (eg. a JOIN
instead of the IN()
, pushing the query in the WITH
clause into a (sub)query and so on).
And a final notice: varchar(255)
is usually a sign of a value of unknown-before length - if you really want to constrain it, you may want to choose a meaningful limit. Otherwise, an unlimited varchar
(or text
) has a slight performance advantage in PostgreSQL over the limited ones. Furthermore, from your example it seems that shared_parent_id
is a number (integer) - use the best fitting type.
This answer is based upon your question as it stood before the clarification about each level being a different type. Since you have identified the need for different types, I agree with my answer as it originally appeared, and your self-answer documenting how you've approached this problem. Adding a single column to the grandchild table, referencing the top-most table, seems the simplest approach.
I'm leaving the below details in-place in case it helps a future visitor.
I would implement this with a cross reference table.
Below is a SQL Server-specific example; this table contains columns about the entities, including name, etc:
CREATE TABLE dbo.Entities
(
EntityID int NOT NULL
CONSTRAINT PK_Entities
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, EntityName varchar(30) NOT NULL
);
This table describes their relationships:
CREATE TABLE dbo.EntityRelationships
(
EntityIDParent int NOT NULL
CONSTRAINT FK_EntityRelationships_Parent
FOREIGN KEY
REFERENCES dbo.Entities (EntityID)
, EntityIDChild int NOT NULL
CONSTRAINT FK_EntityRelationships_Child
FOREIGN KEY
REFERENCES dbo.Entities (EntityID)
, CONSTRAINT PK_EntityRelationships
PRIMARY KEY CLUSTERED (EntityIDParent, EntityIDChild)
, CONSTRAINT CK_EntitytRelationships
CHECK ((EntityIDParent <> EntityIDChild))
);
Each relationship must be unique, that is any given parent can only be related to any given child once.
Next, we create an INSTEAD OF DELETE
trigger on the Entities
table that will handle deletes properly, by reparenting any relationships necessary, prior to removing the deleted Entity:
CREATE TRIGGER EntityRelationshipDelete
ON dbo.Entities
INSTEAD OF DELETE
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO dbo.EntityRelationships (EntityIDParent, EntityIDChild)
SELECT erp.EntityIDParent
, erc.EntityIDChild
FROM deleted d
INNER JOIN dbo.EntityRelationships erp ON d.EntityID = erp.EntityIDChild
INNER JOIN dbo.EntityRelationships erc ON d.EntityID = erc.EntityIDParent
EXCEPT --don't create duplicate entries
SELECT er.EntityIDParent, er.EntityIDChild
FROM dbo.EntityRelationships er;
DELETE
FROM dbo.EntityRelationships
FROM dbo.EntityRelationships er
INNER JOIN deleted d ON er.EntityIDChild = d.EntityID OR er.EntityIDParent = d.EntityID;
DELETE
FROM dbo.Entities
FROM dbo.Entities e
INNER JOIN deleted d ON e.EntityID = d.EntityID;
END;
GO
Here we'll test that setup:
INSERT INTO dbo.Entities (EntityName)
VALUES ('Grandparent')
, ('Parent')
, ('Child');
INSERT INTO dbo.EntityRelationships (EntityIDParent, EntityIDChild)
VALUES (1, 2)
, (2, 3);
SELECT Parents.EntityName
, Children.EntityName
FROM dbo.EntityRelationships er
INNER JOIN dbo.Entities Parents ON er.EntityIDParent = Parents.EntityID
INNER JOIN dbo.Entities Children ON er.EntityIDChild = Children.EntityID;
The results of the select above:
╔═════════════╦════════════╗
║ EntityName ║ EntityName ║
╠═════════════╬════════════╣
║ Grandparent ║ Parent ║
║ Parent ║ Child ║
╚═════════════╩════════════╝
Here, we'll delete the "Parent" entity, and re-query the relationships:
DELETE
FROM dbo.Entities
WHERE dbo.Entities.EntityName = 'Parent';
SELECT Parents.EntityName
, Children.EntityName
FROM dbo.EntityRelationships er
INNER JOIN dbo.Entities Parents ON er.EntityIDParent = Parents.EntityID
INNER JOIN dbo.Entities Children ON er.EntityIDChild = Children.EntityID;
The result:
╔═════════════╦════════════╗
║ EntityName ║ EntityName ║
╠═════════════╬════════════╣
║ Grandparent ║ Child ║
╚═════════════╩════════════╝
Note that running DELETE FROM dbo.Entities
(with no WHERE
clause) will delete all rows from both tables.
To show a slight more complex example; imagine you have 2 grandparents, 2 parents, and single child:
INSERT INTO dbo.Entities (EntityName)
VALUES ('Grandparent 1')
, ('Grandparent 2')
, ('Parent 1')
, ('Parent 2')
, ('Child');
INSERT INTO dbo.EntityRelationships (EntityIDParent, EntityIDChild)
VALUES (1, 3)
, (2, 3)
, (1, 4)
, (3, 5)
, (4, 5);
SELECT Parents.EntityName
, Children.EntityName
FROM dbo.EntityRelationships er
INNER JOIN dbo.Entities Parents ON er.EntityIDParent = Parents.EntityID
INNER JOIN dbo.Entities Children ON er.EntityIDChild = Children.EntityID;
╔═══════════════╦════════════╗
║ EntityName ║ EntityName ║
╠═══════════════╬════════════╣
║ Grandparent 1 ║ Parent 1 ║
║ Grandparent 1 ║ Parent 2 ║
║ Grandparent 2 ║ Parent 1 ║
║ Parent 1 ║ Child ║
║ Parent 2 ║ Child ║
╚═══════════════╩════════════╝
If we remove Parent 1
from the Entities
table:
DELETE
FROM dbo.Entities
WHERE dbo.Entities.EntityName = 'Parent 1';
we see this:
╔═══════════════╦════════════╗
║ EntityName ║ EntityName ║
╠═══════════════╬════════════╣
║ Grandparent 1 ║ Parent 2 ║
║ Grandparent 1 ║ Child ║
║ Grandparent 2 ║ Child ║
║ Parent 2 ║ Child ║
╚═══════════════╩════════════╝
This performs a cleanup of our test data:
IF OBJECT_ID(N'dbo.EntityRelationships', N'U') IS NOT NULL
DROP TABLE dbo.EntityRelationships;
IF OBJECT_ID(N'dbo.Entities', N'U') IS NOT NULL
DROP TABLE dbo.Entities;
GO
Best Answer
This query should do it:
The anti-semi-join with
NOT EXISTS
only keeps children with no other active parent than the given A. This query does not care whether the given parentA
is active itself.Should be as fast as possible.
There are several basic techniques to check there is no related row in another table:
Related answer using a
LEFT JOIN
: