I would probably abandon the INFORMATION_SCHEMA
views here and use the new sys.
views (as opposed to the backward compatibility ones) instead, or at least materialize the results of the JOIN
into an indexed table first.
Recursive CTEs always get the same basic plan in SQL Server where each row is added to a stack spool and processed one by one. This means that the join between REFERENTIAL_CONSTRAINTS RC, KEY_COLUMN_USAGE KCU1, KEY_COLUMN_USAGE KCU2
will happen as many times as the result of the following query SELECT COUNT(*) FROM MyRecursion
.
Which I assume in your case (from the 11 mins execution time) is probably many thousands of times so you need the recursive part to be as efficient as possible. Your query will be performing the following type of thing thousands of times.
SELECT
KCU1.TABLE_CATALOG,
KCU1.TABLE_SCHEMA,
KCU1.TABLE_NAME
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME
AND KCU2.ORDINAL_POSITION = KCU1.ORDINAL_POSITION
WHERE KCU2.TABLE_NAME = 'FOO'
(Side note: Both versions of your query will return incorrect results if same table name in different schemas)
As you can see the plan for this is pretty horrible.
Comparing this with the plan for your sys
query which is somewhat simpler.
SELECT OnTable = OnTable.name,
AgainstTable = AgainstTable.name
FROM sysforeignkeys fk
INNER JOIN sysobjects OnTable
ON fk.fkeyid = OnTable.id
INNER JOIN sysobjects AgainstTable
ON fk.rkeyid = AgainstTable.id
WHERE AgainstTable.name = 'FOO'
you may be able to encourage the intermediate materialisation without creating a #temp
table explicitly by changing the definition of MyData
to
MyData AS
(
SELECT TOP 99.999999 PERCENT
TABLE_NAME AS OnTable
,Fkeys.AgainstTable AS AgainstTable
FROM INFORMATION_SCHEMA.TABLES
LEFT JOIN Fkeys
ON TABLE_NAME = Fkeys.OnTable
WHERE (1=1)
AND TABLE_TYPE = 'BASE TABLE'
AND TABLE_NAME NOT IN ('sysdiagrams', 'dtproperties')
ORDER BY TABLE_NAME
)
Testing against Adventureworks2008
on my machine this brought the runtime down from about 10 seconds to 250ms (after the first run was out of the way as the plan took 2 seconds to compile). It adds an eager spool to the plan that materialises the result of the Join on the first recursive call then replays it on subsequent calls. This behaviour is not guaranteed however and you may want to upvote the Connect item request Provide a hint to force intermediate materialization of CTEs or derived tables
I would feel safer creating the #temp
table explicitly as below rather than relying on this behaviour.
CREATE TABLE #MyData
(
OnTable SYSNAME,
AgainstTable NVARCHAR(128) NULL,
UNIQUE CLUSTERED (AgainstTable, OnTable)
);
WITH Fkeys AS
(
SELECT DISTINCT
KCU1.TABLE_NAME AS OnTable
,KCU2.TABLE_NAME AS AgainstTable
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME
AND KCU2.ORDINAL_POSITION = KCU1.ORDINAL_POSITION
WHERE (1=1)
AND KCU1.TABLE_NAME <> KCU2.TABLE_NAME
)
,MyData AS
(
SELECT
TABLE_NAME AS OnTable
,Fkeys.AgainstTable AS AgainstTable
FROM INFORMATION_SCHEMA.TABLES
LEFT JOIN Fkeys
ON TABLE_NAME = Fkeys.OnTable
WHERE (1=1)
AND TABLE_TYPE = 'BASE TABLE'
AND TABLE_NAME NOT IN ('sysdiagrams', 'dtproperties')
)
INSERT INTO #MyData
SELECT *
FROM MyData;
WITH MyRecursion AS
(
-- base case
SELECT
OnTable AS TableName
,1 AS Lvl
FROM #MyData
WHERE 1=1
AND AgainstTable IS NULL
-- recursive case
UNION ALL
SELECT
OnTable AS TableName
,r.Lvl + 1 AS Lvl
FROM #MyData d
INNER JOIN MyRecursion r
ON d.AgainstTable = r.TableName
)
SELECT
MAX(Lvl) AS Lvl
,TableName
--,strSql = 'delete from [' + tablename + ']'
FROM
MyRecursion
GROUP BY
TableName
ORDER BY Lvl
DROP TABLE #MyData
or alternatively
Best Answer
Thanks to mrdenny! Your commment pointed me in the right direction!
I've pasted the query into SQL Management Studio
Then used the option Display Estimated Execution Plan
This gave me an output that an index in one of the tables belonging to the view needed to be recreated, so I deleted it and create it from scratch using the suggested output.
Now the query takes only 2 seconds and not the 1:30min that it used to take!
Thanks a lot guys