We are constructing a data warehouse that initially will be 1 TB and will grow around 20gigs every month.
For certain tables we are doing daily ETL processes and others we are doing weekly/monthly.
When there is a data import going into a table, is it necessary to drop and recreate the indexes?
Is there ever a point to dropping and recreating indexes or are they automatically updated?
The statistics are set to update automatically.
Thank you so much for your help and guidance.
I got this genius script:
SELECT 'ALTER INDEX [' + ix.name + '] ON [' + s.name + '].[' + t.name + '] ' +
CASE WHEN ps.avg_fragmentation_in_percent > 40 THEN 'REBUILD' ELSE 'REORGANIZE' END +
CASE WHEN pc.partition_count > 1 THEN ' PARTITION = ' + cast(ps.partition_number as nvarchar(max)) ELSE '' END
FROM sys.indexes AS ix INNER JOIN sys.tables t
ON t.object_id = ix.object_id
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN (SELECT object_id, index_id, avg_fragmentation_in_percent, partition_number
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)) ps
ON t.object_id = ps.object_id AND ix.index_id = ps.index_id
INNER JOIN (SELECT object_id, index_id, COUNT(DISTINCT partition_number) AS partition_count
FROM sys.partitions
GROUP BY object_id, index_id) pc
ON t.object_id = pc.object_id AND ix.index_id = pc.index_id
WHERE ps.avg_fragmentation_in_percent > 10 AND
ix.name IS NOT NULL
from here:
http://weblogs.asp.net/okloeten/archive/2009/01/05/6819737.aspx
Do you suggest I run this script daily and based on the findings run the generated code?
Best Answer
If this is cyclical ETL, and you are in a development (i.e. NOT LIVE) data environment, then you definitely should manage your indexes as a part of your load cycle.
I do this for several data sets every month, the largest of which adds around 100 GB monthly to a 5 TB data set.
I have done extensive testing, and from my own experience the most efficient way to load with regard to indexes is:
DISABLE
non-clustered indexes, leaving clustered index intactREBUILD
NC indexesIf you only add rows periodically as part of managed ETL, this is the way to go. This also ensures all your statistics are up to date.
For stats, it's important to note that adding 20GB to a 1TB database won't reach the tipping point for a stats auto-update, so you can add a whole month of data without ever updating statistics.
Rebuilding your NC indexes is a good way around this. You may want to also do a clustered index rebuild periodically if fragmentation gets high (depending on your table structure and your clustered key).