Question: I'm porting the below query (listing tables by foreign key dependencies) to PostGreSql.
WITH Fkeys AS (
SELECT DISTINCT
OnTable = OnTable.name
,AgainstTable = AgainstTable.name
FROM sysforeignkeys fk
INNER JOIN sysobjects onTable
ON fk.fkeyid = onTable.id
INNER JOIN sysobjects againstTable
ON fk.rkeyid = againstTable.id
WHERE 1=1
AND AgainstTable.TYPE = 'U'
AND OnTable.TYPE = 'U'
-- ignore self joins; they cause an infinite recursion
AND OnTable.Name <> AgainstTable.Name
)
,MyData AS (
SELECT
OnTable = o.name
,AgainstTable = FKeys.againstTable
FROM sys.objects o
LEFT JOIN FKeys
ON o.name = FKeys.onTable
WHERE (1=1)
AND o.type = 'U'
AND o.name NOT LIKE 'sys%'
)
,MyRecursion AS (
-- base case
SELECT
TableName = OnTable
,Lvl = 1
FROM MyData
WHERE 1=1
AND AgainstTable IS NULL
-- recursive case
UNION ALL
SELECT
TableName = OnTable
,Lvl = r.Lvl + 1
FROM MyData d
INNER JOIN MyRecursion r
ON d.AgainstTable = r.TableName
)
SELECT
Lvl = MAX(Lvl)
,TableName
--,strSql = 'delete from [' + tablename + ']'
FROM
MyRecursion
GROUP BY
TableName
ORDER BY lvl
/*
ORDER BY
2 ASC
,1 ASC
*/
Using information_schema, the query looks like this:
WITH Fkeys AS
(
SELECT DISTINCT
KCU1.TABLE_NAME AS OnTable
,KCU2.TABLE_NAME AS AgainstTable
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME
AND KCU2.ORDINAL_POSITION = KCU1.ORDINAL_POSITION
WHERE (1=1)
AND KCU1.TABLE_NAME <> KCU2.TABLE_NAME
)
,MyData AS
(
SELECT
TABLE_NAME AS OnTable
,FKeys.againstTable AS AgainstTable
FROM INFORMATION_SCHEMA.TABLES
LEFT JOIN FKeys
ON TABLE_NAME = FKeys.onTable
WHERE (1=1)
AND TABLE_TYPE = 'BASE TABLE'
AND TABLE_NAME NOT IN ('sysdiagrams', 'dtproperties')
)
,MyRecursion AS
(
-- base case
SELECT
OnTable AS TableName
,1 AS Lvl
FROM MyData
WHERE 1=1
AND AgainstTable IS NULL
-- recursive case
UNION ALL
SELECT
OnTable AS TableName
,r.Lvl + 1 AS Lvl
FROM MyData d
INNER JOIN MyRecursion r
ON d.AgainstTable = r.TableName
)
SELECT
MAX(Lvl) AS Lvl
,TableName
--,strSql = 'delete from [' + tablename + ']'
FROM
MyRecursion
GROUP BY
TableName
ORDER BY lvl
/*
ORDER BY
2 ASC
,1 ASC
*/
My question now is:
In SQL Server (tested on 2008 R2): Why does the query jump from 1 second to 11 minutes when I replace
SELECT DISTINCT
OnTable = OnTable.name
,AgainstTable = AgainstTable.name
FROM sysforeignkeys fk
INNER JOIN sysobjects onTable
ON fk.fkeyid = onTable.id
INNER JOIN sysobjects againstTable
ON fk.rkeyid = againstTable.id
WHERE 1=1
AND AgainstTable.TYPE = 'U'
AND OnTable.TYPE = 'U'
-- ignore self joins; they cause an infinite recursion
AND OnTable.Name <> AgainstTable.Name
with
SELECT DISTINCT
KCU1.TABLE_NAME AS OnTable
,KCU2.TABLE_NAME AS AgainstTable
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME
AND KCU2.ORDINAL_POSITION = KCU1.ORDINAL_POSITION
WHERE (1=1)
AND KCU1.TABLE_NAME <> KCU2.TABLE_NAME
???
As far as I can tell, there really is no significant speed difference when running only the partial queries separately. Also the result set is entirely the same (I checked every row in Excel), though ordering differs.
Below the working PostGreSQL version (finished in 35 ms on exactly the same db content [75 tables]…)
— No warranty whatsoever —
WITH RECURSIVE Fkeys AS
(
SELECT DISTINCT
KCU1.TABLE_NAME AS OnTable
,KCU2.TABLE_NAME AS AgainstTable
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME
AND KCU2.ORDINAL_POSITION = KCU1.ORDINAL_POSITION
)
,MyData AS
(
SELECT
TABLE_NAME AS OnTable
,FKeys.againstTable AS AgainstTable
FROM INFORMATION_SCHEMA.TABLES
LEFT JOIN FKeys
ON TABLE_NAME = FKeys.onTable
WHERE (1=1)
AND TABLE_TYPE = 'BASE TABLE'
AND TABLE_SCHEMA = 'public'
--AND TABLE_NAME NOT IN ('sysdiagrams', 'dtproperties')
)
,MyRecursion AS
(
-- base case
SELECT
OnTable AS TableName
,1 AS Lvl
FROM MyData
WHERE 1=1
AND AgainstTable IS NULL
-- recursive case
UNION ALL
SELECT
OnTable AS TableName
,r.Lvl + 1 AS Lvl
FROM MyData d
INNER JOIN MyRecursion r
ON d.AgainstTable = r.TableName
)
SELECT
MAX(Lvl) AS Lvl
,TableName
--,strSql = 'delete from [' + tablename + ']'
FROM
MyRecursion
GROUP BY
TableName
ORDER BY lvl
/*
ORDER BY
2 ASC
,1 ASC
*/
It also seems that
AND KCU1.TABLE_NAME <> KCU2.TABLE_NAME
is superfluous when using information_schema, so it should actually be faster.
Best Answer
I would probably abandon the
INFORMATION_SCHEMA
views here and use the newsys.
views (as opposed to the backward compatibility ones) instead, or at least materialize the results of theJOIN
into an indexed table first.Recursive CTEs always get the same basic plan in SQL Server where each row is added to a stack spool and processed one by one. This means that the join between
REFERENTIAL_CONSTRAINTS RC, KEY_COLUMN_USAGE KCU1, KEY_COLUMN_USAGE KCU2
will happen as many times as the result of the following querySELECT COUNT(*) FROM MyRecursion
.Which I assume in your case (from the 11 mins execution time) is probably many thousands of times so you need the recursive part to be as efficient as possible. Your query will be performing the following type of thing thousands of times.
(Side note: Both versions of your query will return incorrect results if same table name in different schemas)
As you can see the plan for this is pretty horrible.
Comparing this with the plan for your
sys
query which is somewhat simpler.you may be able to encourage the intermediate materialisation without creating a
#temp
table explicitly by changing the definition ofMyData
toTesting against
Adventureworks2008
on my machine this brought the runtime down from about 10 seconds to 250ms (after the first run was out of the way as the plan took 2 seconds to compile). It adds an eager spool to the plan that materialises the result of the Join on the first recursive call then replays it on subsequent calls. This behaviour is not guaranteed however and you may want to upvote the Connect item request Provide a hint to force intermediate materialization of CTEs or derived tablesI would feel safer creating the
#temp
table explicitly as below rather than relying on this behaviour.or alternatively