I would like to select all parent rows for which exists a combination of children rows with desired value. The easiest approach is to use multiple exists clauses, but this is not optimal – anyone knows better solution? Here is a SQL fiddle: http://sqlfiddle.com/#!6/8da76/5
Schema:
CREATE TABLE [dbo].[Parents](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[GroupId] [uniqueidentifier] NOT NULL
);
INSERT INTO [Parents] ([GroupId]) VALUES ('00000000-0000-0000-0000-000000000001');
INSERT INTO [Parents] ([GroupId]) VALUES ('00000000-0000-0000-0000-000000000002');
INSERT INTO [Parents] ([GroupId]) VALUES ('00000000-0000-0000-0000-000000000003');
INSERT INTO [Parents] ([GroupId]) VALUES ('00000000-0000-0000-0000-000000000004');
INSERT INTO [Parents] ([GroupId]) VALUES ('00000000-0000-0000-0000-000000000005');
CREATE TABLE [dbo].[Childrens](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[ParentId] [bigint] NOT NULL,
[TypeId] [int] NOT NULL
);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (1, 1);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (1, 2);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (1, 3);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (1, 4);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (1, 5);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (2, 1);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (2, 4);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (2, 5);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (3, 4);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (3, 5);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (3, 6);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (4, 2);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (4, 6);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (5, 1);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (5, 2);
INSERT INTO [Childrens] ([ParentId], [TypeId]) VALUES (5, 3);
Sample query to select all parent rows for which exists a combination of children rows with TypeId = 1 and 2 (query should return parent with Id = 1 and 5):
SELECT *
FROM Parents p
WHERE EXISTS(
SELECT *
FROM Childrens c
WHERE p.Id = c.ParentId
AND c.TypeId = 1 )
AND EXISTS(
SELECT *
FROM Childrens c
WHERE p.Id = c.ParentId
AND c.TypeId = 2 );
Sample query to select all parent rows for which exists children row with TypeId = 4, 5 and 6 (should return parent with Id = 3)
SELECT *
FROM Parents p
WHERE EXISTS(
SELECT *
FROM Childrens c
WHERE p.Id = c.ParentId
AND c.TypeId = 4 )
AND EXISTS(
SELECT *
FROM Childrens c
WHERE p.Id = c.ParentId
AND c.TypeId = 5 )
AND EXISTS(
SELECT *
FROM Childrens c
WHERE p.Id = c.ParentId
AND c.TypeId = 6 );
Number of conditions for children TypeId is variable.
Best Regards!
Best Answer
Below is one method. In the outer WHERE clause, specify the number of types included in the
IN
clause. An composite index on theChildrens
tableParentId
andTypeId
will help optimize the query.EDIT: The original DDL in the question didn't include constraints or indexes. These aren't relevant for query functionality but are important to compare performance of alternative solutions.
Assuming the
IDENTITY
columns in both tables are also the primary keys and the tables are related, I added the constraints below.I added candidate indexes on columns specified on JOIN or WHERE clauses and specified
UNIQUE
for those indexes that include the primary key column. Indexes not deemed to be used can be dropped afterward.Finally, I ran the original query in my answer in SSMS along with the query below which Tpsamw1 proposed in a comment (but without ORDER BY to level the playing field).
SET STATISTICS IO ON
and including the actual execution plan:SSMS reported comparable performance (50% of batch cost for each query) but that doesn't tell the whole story. The STATISTICS IO showed a higher number of logical reads for my original query, which I believe a better indicator of performance. This suggests Tpsamw1's query performs better. Below are the stats:
My original query:
Tpsamw1 query:
I then loaded additional data by copying the original test data and update stats:
Running the queries again shows execution plan changes to both queries. Both queries yielded the same number of logical IOs, indicating performance is be about the same. The only execution plan difference was an inner merge join versus a right outer merge join.