To answer your main question directly, the sorts are there to present rows to update operators (performing deletions in this case) in index key order. The principle at work here is that sorting on the keys will promote sequential access to the index.
This can be a good optimization, though the details depend on your hardware, how likely the affected pages are to be in memory, and whether the sorts can complete within the memory allocated to them. When the optimizer decides the cost of sorting will be paid back by the increased efficiencies associated with sequential index access, it sets a property DMLRequestSort
on the update operator:
The optimizer may also decide to split the update into separate operators to maintain the clustered index (or heap) and then the nonclustered indexes. often, it will decide to sort more than once - first for the clustered index keys, and then again for the nonclustered index(es). Again, where sorting is considered optimal, each index update operator will have the DMLRequestSort
property set to true.
All that said, the things I would fix first would be to eliminate the index scans where the join operator they feed is a nested loops join, and to remove the eager index spools, which are inserting rows into an empty index every time the query is executed. An eager index spool is often the clearest possible sign that you are missing a useful permanent index. The seek predicate in the index spool operator identifies the keys the optimizer would like an index on.
Examples of tables that are missing a nonclustered index (requiring an eager index spool) are:
child6gc8Selections
gc9s
child7s
gc6s
Examples of tables that are currently being scanned below a nested loops join are:
child1
parentObjectMessages
child8s
child7s
child6s
child5s
child4s
child3s
child2s
Taking the example shown above, the Clustered Index Scan has an output list of Id, parentObjectId
, the Nested Loops Join predicate is child7s.parentObjectId = parentObject.Id
, and the join output column list is child7s.Id
.
From that information, it seems a good access method (index) on child7s
for this part of the query would be keyed on parentObjectId
with Id
as an included column. You should be able to work out how best to work this into your existing indexing strategy.
The following are examples of tables where the optimizer is currently choosing a hash join. I would check tables like this to ensure that is a reasonable access method:
child6gc8Selections
gc2s
gc5s
gc6Properties
The table child2bigChild
also participates in a merge join where an explicit sort is necessary. Again, I would check to see if this sort could be avoided.
Once the basic indexing issues are resolved, we can look at other optimizations if necessary.
Because a foreign key can point to a primary key or a unique constraint, and whoever created that foreign key possibly created it before the primary key existed (or they shifted the FK to point to the Unique index while they changed something else about the primary key). This is easy to repro:
CREATE TABLE dbo.MyTable(MyTableID INT NOT NULL, CONSTRAINT myx UNIQUE(MyTableID));
CREATE TABLE dbo.OtherTable1(ID INT FOREIGN KEY REFERENCES dbo.MyTable(MyTableID));
ALTER TABLE dbo.MyTable ADD CONSTRAINT PKmyx PRIMARY KEY(MyTableID);
CREATE TABLE dbo.OtherTable2(ID INT FOREIGN KEY REFERENCES dbo.MyTable(MyTableID));
In fact, both of these foreign keys will point to the first unique constraint defined on that column (myx
).
You can fix the foreign key on the other table by dropping it and re-creating it. You will need to repeat that process for any other tables that point to this column. You can find these easily:
SELECT s.name,t.name,fk.name
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.foreign_keys AS fk
ON fkc.constraint_object_id = fk.[object_id]
INNER JOIN sys.tables AS t
ON fkc.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
INNER JOIN sys.columns AS c1
ON c1.[object_id] = fkc.referenced_object_id
AND c1.column_id = fkc.referenced_column_id
AND c1.name = N'MyTableID'
WHERE fkc.referenced_object_id = OBJECT_ID('dbo.MyTable');
Results:
dbo OtherTable1 FK__OtherTable1__ID__32E0915F
dbo OtherTable2 FK__OtherTable2__ID__35BCFE0A
And even generate a script to drop and re-create them (dropping the redundant unique constraint in the meantime):
DECLARE
@sql1 NVARCHAR(MAX) = N'',
@sql2 NVARCHAR(MAX) = N'ALTER TABLE dbo.MyTable DROP CONSTRAINT myx;',
@sql3 NVARCHAR(MAX) = N'';
SELECT
@sql1 += N'
ALTER TABLE ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name)
+ ' DROP CONSTRAINT ' + QUOTENAME(fk.name) + ';',
@sql3 += N'
ALTER TABLE ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name)
+ ' ADD CONSTRAINT ' + QUOTENAME(fk.name) + ' FOREIGN KEY '
+ '(' + QUOTENAME(c2.name) + ') REFERENCES dbo.MyTable(MyTableID);'
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.foreign_keys AS fk
ON fkc.constraint_object_id = fk.[object_id]
INNER JOIN sys.tables AS t
ON fkc.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
INNER JOIN sys.columns AS c1
ON c1.[object_id] = fkc.referenced_object_id
AND c1.column_id = fkc.referenced_column_id
AND c1.name = N'MyTableID'
INNER JOIN sys.columns AS c2
ON c2.[object_id] = fkc.parent_object_id
AND c2.column_id = fkc.parent_column_id
WHERE fkc.referenced_object_id = OBJECT_ID('dbo.MyTable');
PRINT @sql1;
PRINT @sql2;
PRINT @sql3;
-- EXEC sp_executesql @sql1;
-- EXEC sp_executesql @sql2;
-- EXEC sp_executesql @sql3;
Results:
ALTER TABLE [dbo].[OtherTable1] DROP CONSTRAINT [FK__OtherTable1__ID__32E0915F];
ALTER TABLE [dbo].[OtherTable2] DROP CONSTRAINT [FK__OtherTable2__ID__35BCFE0A];
ALTER TABLE dbo.MyTable DROP CONSTRAINT myx;
ALTER TABLE [dbo].[OtherTable1] ADD CONSTRAINT [FK__OtherTable1__ID__32E0915F]
FOREIGN KEY ([ID]) REFERENCES dbo.MyTable(MyTableID);
ALTER TABLE [dbo].[OtherTable2] ADD CONSTRAINT [FK__OtherTable2__ID__35BCFE0A]
FOREIGN KEY ([ID]) REFERENCES dbo.MyTable(MyTableID);
This explicitly handles this case, where the constraint only involves a single column. It gets a little more complex if there are multiple columns involved (and this answer is not meant to solve that problem). I also didn't test if this works exactly as coded if the foreign keys point to a redundant unique index (which has the same underlying structure but is created with slightly different DDL). Exercise for the reader. :-)
Best Answer
The (lack of) documentation suggests that this behaviour is an implementation detail, and is therefore undefined and subject to change at any time.
This is in stark contrast to CREATE FULLTEXT INDEX, where you have to specify the name of an index to attach to -- AFAIK, there is no undocumented
FOREIGN KEY
syntax to do the equivalent (though theoretically, there could be in the future).As mentioned, it does make sense that SQL Server chooses the smallest physical index with which to associate the foreign key. If you change the script to create the unique constraint as
CLUSTERED
, the script "works" on 2008 R2. But that behaviour is still undefined and should not be relied upon.As with most legacy applications, you'll just have to get down to the nitty-gritty and clean things up.