Sql-server – How does SQL Server choose an index key for a foreign key reference

indexprimary-keysql serversql-server-2008-r2

I am working with a legacy database that was imported from MS Access. There are about twenty tables with non-clustered, unique primary keys that were created during the MS Access > SQL Server upgrade.

Many of these tables also have unique, non-clustered indexes that are duplicates of the primary key.

I am attempting to clean this up.

But what I have found is after I recreate the primary keys as clustered indexes, and then try to rebuild the foreign key, the foreign key is referencing the old, duplicate index (which was unique).

I know this because it won't let me drop the duplicate indexes.

I would think SQL Server would always choose a primary key if one existed. Does SQL Server have a method of choosing between a unique index and a primary key?

To duplicate the problem (on SQL Server 2008 R2):

IF EXISTS (SELECT * FROM sys.tables WHERE name = 'Child') DROP TABLE Child
GO
IF EXISTS (SELECT * FROM sys.tables WHERE name = 'Parent') DROP TABLE Parent
GO

-- Create the parent table
CREATE TABLE Parent (ParentID INT NOT NULL IDENTITY(1,1)) 

-- Make the parent table a heap
ALTER TABLE Parent ADD CONSTRAINT PK_Parent PRIMARY KEY NONCLUSTERED (ParentID) 

-- Create the duplicate index on the parent table
CREATE UNIQUE NONCLUSTERED INDEX IX_Parent ON Parent (ParentID) 

-- Create the child table
CREATE TABLE Child  (ChildID  INT NOT NULL IDENTITY(1,1), ParentID INT NOT NULL ) 

-- Give the child table a normal PKey
ALTER TABLE Child ADD CONSTRAINT PK_Child PRIMARY KEY CLUSTERED (ChildID) 

-- Create a foreign key relationship with the Parent table on ParentID
ALTER TABLE Child ADD CONSTRAINT FK_Child FOREIGN KEY (ParentID) 
REFERENCES Parent (ParentID) ON DELETE CASCADE NOT FOR REPLICATION

-- Try to clean this up
-- Drop the foreign key constraint on the Child table
ALTER TABLE Child DROP CONSTRAINT FK_Child

-- Drop the primary key constraint on the Parent table
ALTER TABLE Parent DROP CONSTRAINT PK_Parent

-- Recreate the primary key on Parent as a clustered index
ALTER TABLE Parent ADD CONSTRAINT PK_Parent PRIMARY KEY CLUSTERED (ParentID) 

-- Recreate the foreign key in Child pointing to parent ID
ALTER TABLE Child ADD CONSTRAINT FK_Child FOREIGN KEY (ParentID) 
REFERENCES Parent (ParentID) ON DELETE CASCADE NOT FOR REPLICATION

-- Try to drop the duplicate index on Parent 
DROP INDEX IX_Parent ON Parent

Error msg:

Msg 3723, Level 16, State 6, Line 36 An explicit DROP INDEX is not
allowed on index 'Parent.IX_Parent'. It is being used for FOREIGN KEY
constraint enforcement.

Best Answer

The (lack of) documentation suggests that this behaviour is an implementation detail, and is therefore undefined and subject to change at any time.

This is in stark contrast to CREATE FULLTEXT INDEX, where you have to specify the name of an index to attach to -- AFAIK, there is no undocumented FOREIGN KEY syntax to do the equivalent (though theoretically, there could be in the future).

As mentioned, it does make sense that SQL Server chooses the smallest physical index with which to associate the foreign key. If you change the script to create the unique constraint as CLUSTERED, the script "works" on 2008 R2. But that behaviour is still undefined and should not be relied upon.

As with most legacy applications, you'll just have to get down to the nitty-gritty and clean things up.

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

To answer your main question directly, the sorts are there to present rows to update operators (performing deletions in this case) in index key order. The principle at work here is that sorting on the keys will promote sequential access to the index.

This can be a good optimization, though the details depend on your hardware, how likely the affected pages are to be in memory, and whether the sorts can complete within the memory allocated to them. When the optimizer decides the cost of sorting will be paid back by the increased efficiencies associated with sequential index access, it sets a property DMLRequestSort on the update operator:

DML Request Sort

The optimizer may also decide to split the update into separate operators to maintain the clustered index (or heap) and then the nonclustered indexes. often, it will decide to sort more than once - first for the clustered index keys, and then again for the nonclustered index(es). Again, where sorting is considered optimal, each index update operator will have the DMLRequestSort property set to true.

All that said, the things I would fix first would be to eliminate the index scans where the join operator they feed is a nested loops join, and to remove the eager index spools, which are inserting rows into an empty index every time the query is executed. An eager index spool is often the clearest possible sign that you are missing a useful permanent index. The seek predicate in the index spool operator identifies the keys the optimizer would like an index on.

Examples of tables that are missing a nonclustered index (requiring an eager index spool) are:

child6gc8Selections
gc9s
child7s
gc6s

eager index spools

Examples of tables that are currently being scanned below a nested loops join are:

child1
parentObjectMessages
child8s
child7s
child6s
child5s
child4s
child3s
child2s

scan below nested loops

Taking the example shown above, the Clustered Index Scan has an output list of Id, parentObjectId, the Nested Loops Join predicate is child7s.parentObjectId = parentObject.Id, and the join output column list is child7s.Id.

From that information, it seems a good access method (index) on child7s for this part of the query would be keyed on parentObjectId with Id as an included column. You should be able to work out how best to work this into your existing indexing strategy.

The following are examples of tables where the optimizer is currently choosing a hash join. I would check tables like this to ensure that is a reasonable access method:

child6gc8Selections
gc2s
gc5s
gc6Properties

hash joins

The table child2bigChild also participates in a merge join where an explicit sort is necessary. Again, I would check to see if this sort could be avoided.

sort before merge

Once the basic indexing issues are resolved, we can look at other optimizations if necessary.

Sql-server – Unable to drop non-PK index because it is referenced in a foreign key constraint

Because a foreign key can point to a primary key or a unique constraint, and whoever created that foreign key possibly created it before the primary key existed (or they shifted the FK to point to the Unique index while they changed something else about the primary key). This is easy to repro:

CREATE TABLE dbo.MyTable(MyTableID INT NOT NULL, CONSTRAINT myx UNIQUE(MyTableID));

CREATE TABLE dbo.OtherTable1(ID INT FOREIGN KEY REFERENCES dbo.MyTable(MyTableID));

ALTER TABLE dbo.MyTable ADD CONSTRAINT PKmyx PRIMARY KEY(MyTableID);

CREATE TABLE dbo.OtherTable2(ID INT FOREIGN KEY REFERENCES dbo.MyTable(MyTableID));

In fact, both of these foreign keys will point to the first unique constraint defined on that column (myx).

You can fix the foreign key on the other table by dropping it and re-creating it. You will need to repeat that process for any other tables that point to this column. You can find these easily:

SELECT s.name,t.name,fk.name
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.foreign_keys AS fk
ON fkc.constraint_object_id = fk.[object_id]
INNER JOIN sys.tables AS t
ON fkc.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
INNER JOIN sys.columns AS c1
ON c1.[object_id] = fkc.referenced_object_id
AND c1.column_id = fkc.referenced_column_id
AND c1.name = N'MyTableID'
WHERE fkc.referenced_object_id = OBJECT_ID('dbo.MyTable');

Results:

dbo    OtherTable1    FK__OtherTable1__ID__32E0915F
dbo    OtherTable2    FK__OtherTable2__ID__35BCFE0A

And even generate a script to drop and re-create them (dropping the redundant unique constraint in the meantime):

DECLARE 
  @sql1 NVARCHAR(MAX) = N'', 
  @sql2 NVARCHAR(MAX) = N'ALTER TABLE dbo.MyTable DROP CONSTRAINT myx;', 
  @sql3 NVARCHAR(MAX) = N'';

SELECT 
  @sql1 += N'
ALTER TABLE ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name)
  + ' DROP CONSTRAINT ' + QUOTENAME(fk.name) + ';',
  @sql3 += N'
ALTER TABLE ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name)
  + ' ADD CONSTRAINT ' + QUOTENAME(fk.name) + ' FOREIGN KEY '
  + '(' + QUOTENAME(c2.name) + ') REFERENCES dbo.MyTable(MyTableID);'
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.foreign_keys AS fk
ON fkc.constraint_object_id = fk.[object_id]
INNER JOIN sys.tables AS t
ON fkc.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
INNER JOIN sys.columns AS c1
ON c1.[object_id] = fkc.referenced_object_id
AND c1.column_id = fkc.referenced_column_id
AND c1.name = N'MyTableID'
INNER JOIN sys.columns AS c2
ON c2.[object_id] = fkc.parent_object_id
AND c2.column_id = fkc.parent_column_id
WHERE fkc.referenced_object_id = OBJECT_ID('dbo.MyTable');

PRINT @sql1;
PRINT @sql2;
PRINT @sql3;
-- EXEC sp_executesql @sql1;
-- EXEC sp_executesql @sql2;
-- EXEC sp_executesql @sql3;

Results:

ALTER TABLE [dbo].[OtherTable1] DROP CONSTRAINT [FK__OtherTable1__ID__32E0915F];
ALTER TABLE [dbo].[OtherTable2] DROP CONSTRAINT [FK__OtherTable2__ID__35BCFE0A];

ALTER TABLE dbo.MyTable DROP CONSTRAINT myx;

ALTER TABLE [dbo].[OtherTable1] ADD CONSTRAINT [FK__OtherTable1__ID__32E0915F] 
  FOREIGN KEY ([ID]) REFERENCES dbo.MyTable(MyTableID);
ALTER TABLE [dbo].[OtherTable2] ADD CONSTRAINT [FK__OtherTable2__ID__35BCFE0A] 
  FOREIGN KEY ([ID]) REFERENCES dbo.MyTable(MyTableID);

This explicitly handles this case, where the constraint only involves a single column. It gets a little more complex if there are multiple columns involved (and this answer is not meant to solve that problem). I also didn't test if this works exactly as coded if the foreign keys point to a redundant unique index (which has the same underlying structure but is created with slightly different DDL). Exercise for the reader. :-)

Best Answer

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

Sql-server – Unable to drop non-PK index because it is referenced in a foreign key constraint

Related Question