I load data into a data warehouse.
Presently I have a script that I use to drop foreign key constraints and indexes prior to loading the data, for convenience and speed. There is a big window in which I can do the load so I don't need to worry about users accessing the data during the load, but I don't want to impact unrelated data in other tables in the database.
I have done some research here and elsewhere to come up with this script but I wonder if there are some things that I might be overlooking which could either make performance sub-optimal or if I might be missing something important (I don't know … calculated columns or something?) or maybe I'm doing things in the wrong order etc.
Any advice appreciated to make this robust and performant.
Disable Constraints and Indexes
Edit: I removed the WHILE
loop which commenters helped me realise was redundant.
Declare @schema varchar(128) = 'dbo';
Declare @sql nvarchar(max) = N'';
-- 1. Indices
-- Select a list of indexes in the schema and generate statements to disable them.
Select @sql = @sql + 'ALTER INDEX ' + QuoteName(idx.name) + ' ON ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' DISABLE;' + CHAR(13)
From sys.indexes As idx
Join sys.objects As obj On idx.object_id = obj.object_id
Where ((obj.type = 'U' And idx.type in (2,6)) -- Non-clustered index/columnstore on a table
Or obj.type = 'V') -- All indexes on indexed views
And obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
Order By obj.name, idx.name;
Execute sp_executesql @sql;
-- 2. Foreign-key constraints
-- Build a list of foreign keys constraints in the schema and generate statements to disable the constraint checking.
Select @sql = @sql + 'ALTER TABLE ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' NOCHECK CONSTRAINT ' + QuoteName(fk.name) + ';' + CHAR(13)
From sys.foreign_keys As fk
Join sys.objects As obj On fk.parent_object_id = obj.object_id
Where obj.schema_id = (Select schema_id From sys.schemas Where name = @schema);
Execute sp_executesql @sql;
Enable constraints, rebuild indexes and update statistics
Declare @schema nvarchar(128) = 'dbo';
Declare @sql nvarchar(max) = N'';
-- 1. Indices
-- Build a list of tables in the schema and generate statements to enable the indices on them.
Select @sql = @sql + 'ALTER INDEX ' + QuoteName(idx.name) + ' ON ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' REBUILD' + iif(idx.type = 6, ' WITH (MAXDOP = 1);', ' WITH (FILLFACTOR = 100);') + CHAR(13)
From sys.indexes idx
Join sys.objects obj ON obj.object_id = idx.object_id
Where ((obj.type = 'U' And idx.type in (2,6)) -- Non-clustered index on a table
Or obj.type = 'V') -- All indexes on indexed views
And obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
And idx.is_disabled = 1 -- Don't rebuild indexes that are already online
And idx.is_hypothetical = 0 -- Don't rebuild hypothetical indexes!
Order By iif(idx.type = 6, 1, 2), obj.name, idx.name;
Execute sp_executesql @sql;
-- 2. Foreign-key constraints
-- Build a list of foreign keys constraints in the schema and generate statements to enable them with checking.
Select @sql = @sql + 'ALTER TABLE ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' WITH CHECK CHECK CONSTRAINT ' + QuoteName(fk.name) + ';' + CHAR(13)
From sys.foreign_keys fk
Join sys.objects obj ON obj.object_id = fk.parent_object_id
Where obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
Order By obj.name, fk.name;
Execute sp_executesql @sql;
-- 3. Statistics
-- Build a list of tables in the schema and generate statements to update the statistics on them.
Select @sql = @sql + 'UPDATE STATISTICS ' + QuoteName(@schema) + '.' + QuoteName(obj.name) + ' WITH COLUMNS;' + CHAR(13)
From sys.objects obj
Where obj.type = 'U' -- User defined
AND obj.schema_id = (Select schema_id From sys.schemas Where name = @schema)
Order By obj.name;
Execute sp_executesql @sql;
Best Answer
For a few things that you may have not considered:
For performance:
ONLINE = OFF
option for possibly a small performance boost due to less locking.SORT_IN_TEMPDB = ON
option for a performance boost.