Sql-server – What’s the best way to order a list of tables based on their foreign key constraints

sql serversql-server-2016

I'm working on an etl task that takes tables from one schema (with no constraints) and inserts them into the same tables (plus constraints).

I don't want to type out a bazillion insert statements, so I wrote some dynamic sql that looks like this:

declare @table varchar(255),
@stgtable varchar(255),
@columns varchar(max),
@sql nvarchar(max)

declare c cursor for

select table_name from INFORMATION_SCHEMA.tables

open c

fetch from c into @table

while @@FETCH_STATUS = 0
begin

set @stgtable = replace(right(@table, len(@table) - 1), 'LZSTG','STG')

--print @stgtable

select @columns = stuff( (
    select ', ' + Column_Name from information_schema.columns
    where Table_Name = @table
    for xml path ('')),1,2,'')
--print @columns

begin try
set @sql = 'truncate table ' + @stgtable 
exec sp_executesql @sql
set @sql = 'insert into ' +  @stgtable  + ' (' + @columns + ') select ' + @columns + ' from ' + @table

print @sql
exec sp_executesql @sql

set @sql = 'select * from ' + @stgtable 
exec sp_executesql @sql
end try

begin catch print 'womp womp ' + @stgtable end catch

fetch next from c into @table
end

close c
deallocate c

Problem: If you go through information_schema.tables in its default order, the insert will often fail because of a foreign key violation.

If I want this solution to work, I have to insert into parent tables first and only then insert into child tables.

I've got this, but it is hardly an ironclad solution:

with fks as 
(SELECT  obj.name AS FK_NAME,
    sch.name AS [schema_name],
    tab1.name AS [table_name],
    col1.name AS [column],
    tab2.name AS [referenced_table],
    col2.name AS [referenced_column]
FROM sys.foreign_key_columns fkc
INNER JOIN sys.objects obj
    ON obj.object_id = fkc.constraint_object_id
INNER JOIN sys.tables tab1
    ON tab1.object_id = fkc.parent_object_id
INNER JOIN sys.schemas sch
    ON tab1.schema_id = sch.schema_id
INNER JOIN sys.columns col1
    ON col1.column_id = parent_column_id AND col1.object_id = tab1.object_id
INNER JOIN sys.tables tab2
    ON tab2.object_id = fkc.referenced_object_id
INNER JOIN sys.columns col2
    ON col2.column_id = referenced_column_id AND col2.object_id = tab2.object_id)

select t.table_name, fks.* from 
information_schema.tables t
left join fks on t.table_name = fks.table_name
order by referenced_table

Question What is the best way to order information_schema so that the parent tables are on top and the child tables are on the bottom?

Best Answer

One problem is that you could have circular references. You may prefer to disable the constraints, do your ETL, and reenable them.

This could also be a matter of timing, such that it may be impossible to populate some data because of when you got the data from the source. For example, you get data from table P and then from table C. But by the time you’re getting the data from table C, you have a row that refers to a row in C that refers to a row in P that wasn’t there a few milliseconds earlier. Or if you do it in the other order, the row in P is deleted before you grab it.

So by all means use a clever script. But you might find you still have problems.

My recommended solution is to question whether you actually want FKs here, or whether you should do some more complex lookups with inferred members instead.