I'm working on an etl task that takes tables from one schema (with no constraints) and inserts them into the same tables (plus constraints).
I don't want to type out a bazillion insert statements, so I wrote some dynamic sql that looks like this:
declare @table varchar(255),
@stgtable varchar(255),
@columns varchar(max),
@sql nvarchar(max)
declare c cursor for
select table_name from INFORMATION_SCHEMA.tables
open c
fetch from c into @table
while @@FETCH_STATUS = 0
begin
set @stgtable = replace(right(@table, len(@table) - 1), 'LZSTG','STG')
--print @stgtable
select @columns = stuff( (
select ', ' + Column_Name from information_schema.columns
where Table_Name = @table
for xml path ('')),1,2,'')
--print @columns
begin try
set @sql = 'truncate table ' + @stgtable
exec sp_executesql @sql
set @sql = 'insert into ' + @stgtable + ' (' + @columns + ') select ' + @columns + ' from ' + @table
print @sql
exec sp_executesql @sql
set @sql = 'select * from ' + @stgtable
exec sp_executesql @sql
end try
begin catch print 'womp womp ' + @stgtable end catch
fetch next from c into @table
end
close c
deallocate c
Problem: If you go through information_schema.tables in its default order, the insert will often fail because of a foreign key violation.
If I want this solution to work, I have to insert into parent tables first and only then insert into child tables.
I've got this, but it is hardly an ironclad solution:
with fks as
(SELECT obj.name AS FK_NAME,
sch.name AS [schema_name],
tab1.name AS [table_name],
col1.name AS [column],
tab2.name AS [referenced_table],
col2.name AS [referenced_column]
FROM sys.foreign_key_columns fkc
INNER JOIN sys.objects obj
ON obj.object_id = fkc.constraint_object_id
INNER JOIN sys.tables tab1
ON tab1.object_id = fkc.parent_object_id
INNER JOIN sys.schemas sch
ON tab1.schema_id = sch.schema_id
INNER JOIN sys.columns col1
ON col1.column_id = parent_column_id AND col1.object_id = tab1.object_id
INNER JOIN sys.tables tab2
ON tab2.object_id = fkc.referenced_object_id
INNER JOIN sys.columns col2
ON col2.column_id = referenced_column_id AND col2.object_id = tab2.object_id)
select t.table_name, fks.* from
information_schema.tables t
left join fks on t.table_name = fks.table_name
order by referenced_table
Question What is the best way to order information_schema so that the parent tables are on top and the child tables are on the bottom?
Best Answer
One problem is that you could have circular references. You may prefer to disable the constraints, do your ETL, and reenable them.
This could also be a matter of timing, such that it may be impossible to populate some data because of when you got the data from the source. For example, you get data from table P and then from table C. But by the time you’re getting the data from table C, you have a row that refers to a row in C that refers to a row in P that wasn’t there a few milliseconds earlier. Or if you do it in the other order, the row in P is deleted before you grab it.
So by all means use a clever script. But you might find you still have problems.
My recommended solution is to question whether you actually want FKs here, or whether you should do some more complex lookups with inferred members instead.