SQL Server – Join Two Tables and Include One Row from Parent Table

greatest-n-per-groupjoin;sql serversql-server-2016union

I've got a simple join over a primary and foreign key… but I need to include a row from the parent table in the results for each primary key as a sort of header row for the children. For instance, I have two tables,

parents
ID | Name
0  | FOO
1  | BAR
2  | ASD

children
ID | Name | ParentID
10 | baz  | 0
11 | zap  | 0
12 | zam  | 1
13 | maz  | 1

I'd like to make a view that returns

ParentID | ChildID | Name | isChild
0        |         | FOO  | False
0        | 10      | baz  | True
0        | 11      | zap  | True
1        |         | BAR  | False
1        | 12      | zam  | True
1        | 13      | maz  | True
2        |         | ASD  | False

The data structure doesn't make much sense but it's populating 3rd party software and that's the only format option. Right now I'm hitting the database with an additional query for each parent. I imagine pulling it into one query would be more efficient, I've only got a few hundred parents

Thanks!

Best Answer

Perhaps I'm missing something, but I don't understand why this is not a simple UNION.

--Demo setup
Declare @Parent table (ID int, Name varchar(30))
insert into @Parent(id, Name) values(0,'FOO'),(1,'BAR'),(2,'ASD')
Declare @Children table (ID int, Name varchar(30),ParentId int)
insert into @Children(id, Name,ParentId) values(10,'bas',0),(11,'zap',0),(12,'zam',1),(13,'maz',1)

--The solution
;with UnionedData as
(
SELECT ID AS ParentId
    ,' ' AS ChildID
    ,NAME
    ,'False' AS isChild
FROM @Parent

UNION ALL

SELECT p.Id
    ,convert(VARCHAR(5), c.id) AS ChildID
    ,c.NAME
    ,'True' AS IsChild
FROM @Parent p
JOIN @Children c
    ON c.ParentId = p.id
)
SELECT *
FROM UnionedData
ORDER BY ParentId
    ,ChildID

| ParentId | ChildID | NAME | isChild |
|----------|---------|------|---------|
| 0        |         | FOO  | False   |
| 0        | 10      | bas  | True    |
| 0        | 11      | zap  | True    |
| 1        |         | BAR  | False   |
| 1        | 12      | zam  | True    |
| 1        | 13      | maz  | True    |
| 2        |         | ASD  | False   |

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

To answer your main question directly, the sorts are there to present rows to update operators (performing deletions in this case) in index key order. The principle at work here is that sorting on the keys will promote sequential access to the index.

This can be a good optimization, though the details depend on your hardware, how likely the affected pages are to be in memory, and whether the sorts can complete within the memory allocated to them. When the optimizer decides the cost of sorting will be paid back by the increased efficiencies associated with sequential index access, it sets a property DMLRequestSort on the update operator:

DML Request Sort

The optimizer may also decide to split the update into separate operators to maintain the clustered index (or heap) and then the nonclustered indexes. often, it will decide to sort more than once - first for the clustered index keys, and then again for the nonclustered index(es). Again, where sorting is considered optimal, each index update operator will have the DMLRequestSort property set to true.

All that said, the things I would fix first would be to eliminate the index scans where the join operator they feed is a nested loops join, and to remove the eager index spools, which are inserting rows into an empty index every time the query is executed. An eager index spool is often the clearest possible sign that you are missing a useful permanent index. The seek predicate in the index spool operator identifies the keys the optimizer would like an index on.

Examples of tables that are missing a nonclustered index (requiring an eager index spool) are:

child6gc8Selections
gc9s
child7s
gc6s

eager index spools

Examples of tables that are currently being scanned below a nested loops join are:

child1
parentObjectMessages
child8s
child7s
child6s
child5s
child4s
child3s
child2s

scan below nested loops

Taking the example shown above, the Clustered Index Scan has an output list of Id, parentObjectId, the Nested Loops Join predicate is child7s.parentObjectId = parentObject.Id, and the join output column list is child7s.Id.

From that information, it seems a good access method (index) on child7s for this part of the query would be keyed on parentObjectId with Id as an included column. You should be able to work out how best to work this into your existing indexing strategy.

The following are examples of tables where the optimizer is currently choosing a hash join. I would check tables like this to ensure that is a reasonable access method:

child6gc8Selections
gc2s
gc5s
gc6Properties

hash joins

The table child2bigChild also participates in a merge join where an explicit sort is necessary. Again, I would check to see if this sort could be avoided.

sort before merge

Once the basic indexing issues are resolved, we can look at other optimizations if necessary.

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Best Answer

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

Related Question