Sql-server – When should indexes be dropped and recreated

data-warehousesql serversql-server-2008

We are constructing a data warehouse that initially will be 1 TB and will grow around 20gigs every month.

For certain tables we are doing daily ETL processes and others we are doing weekly/monthly.

When there is a data import going into a table, is it necessary to drop and recreate the indexes?

Is there ever a point to dropping and recreating indexes or are they automatically updated?

The statistics are set to update automatically.

Thank you so much for your help and guidance.

I got this genius script:

SELECT 'ALTER INDEX [' + ix.name + '] ON [' + s.name + '].[' + t.name + '] ' +
       CASE WHEN ps.avg_fragmentation_in_percent > 40 THEN 'REBUILD' ELSE 'REORGANIZE' END +
       CASE WHEN pc.partition_count > 1 THEN ' PARTITION = ' + cast(ps.partition_number as nvarchar(max)) ELSE '' END
FROM   sys.indexes AS ix INNER JOIN sys.tables t
           ON t.object_id = ix.object_id
       INNER JOIN sys.schemas s
           ON t.schema_id = s.schema_id
       INNER JOIN (SELECT object_id, index_id, avg_fragmentation_in_percent, partition_number
                   FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)) ps
           ON t.object_id = ps.object_id AND ix.index_id = ps.index_id
       INNER JOIN (SELECT object_id, index_id, COUNT(DISTINCT partition_number) AS partition_count
                   FROM sys.partitions
                   GROUP BY object_id, index_id) pc
           ON t.object_id = pc.object_id AND ix.index_id = pc.index_id
WHERE  ps.avg_fragmentation_in_percent > 10 AND
       ix.name IS NOT NULL

from here:

http://weblogs.asp.net/okloeten/archive/2009/01/05/6819737.aspx

Do you suggest I run this script daily and based on the findings run the generated code?

Best Answer

If this is cyclical ETL, and you are in a development (i.e. NOT LIVE) data environment, then you definitely should manage your indexes as a part of your load cycle.

I do this for several data sets every month, the largest of which adds around 100 GB monthly to a 5 TB data set.

I have done extensive testing, and from my own experience the most efficient way to load with regard to indexes is:

  1. DISABLE non-clustered indexes, leaving clustered index intact
  2. Perform load of raw into your data table
  3. REBUILD NC indexes

If you only add rows periodically as part of managed ETL, this is the way to go. This also ensures all your statistics are up to date.

For stats, it's important to note that adding 20GB to a 1TB database won't reach the tipping point for a stats auto-update, so you can add a whole month of data without ever updating statistics.

Rebuilding your NC indexes is a good way around this. You may want to also do a clustered index rebuild periodically if fragmentation gets high (depending on your table structure and your clustered key).