Sql-server – When to Disable and Recreate Indexes in Data Warehouse ETL

data-warehouseetlperformanceperformance-tuningsql serversql-server-2017

We are moving data from Staging Database into Kimball Data Warehouse DB.
I am hearing different ideas regarding when to disable and rebuild indexes.
Stage tables are truncated every day for each new load. Lets assume we have a customer transaction table, 20 million rows, 5GB. Initial load is 20 mil, and incremental is 100,000 rows everyday (less than 5%)

I am reading its best to disable indexes on new batch fresh loads.

(a) If stage tables are truncated everyday, for Stage: Should I disable index and reenable everyday?

Alter index all on dbo.[StageTable] disable

Insert Into StageTable values (CustomerId, SalesDate, SaleAmount)...

Alter index all on dbo.[StageTable] rebuild

In the actual data warehouse,

(b) Should I reenable and rebuild index for initial load. Should I do this for incremental loads also which are minor?

Alter index all on dbo.[DWTable] disable

Insert Into StageTable values (CustomerId, SalesDate, SaleAmount)...

Alter index all on dbo.[DWTable] rebuild

Best Answer

The answer depends on a number of factors:

  • What is your goal: Minimal I/O? Minimal time to load? Minimal effort to set up? 24/7 query performance?
  • How heavy are the indexes: Just one index with on ID column won't make much difference, but several indexes with lots of columns will make a significant difference.
  • What is the bottleneck? If the data load is coming across the network, writing to disk may not be the bottleneck if you're writing to a high-performance SSD subsystem. If you can write the tables and indexes as fast as it comes across the wire, then there's no way you'll make is faster.

So determine what your goal is, and then test each method. For a 5 GB load, it should be a quick test.

Having said all that, for a 5 GB load I wouldn't bother trying to optimize it. SQL Server can write that in just a few seconds. Pick a strategy and run it--if it takes two minutes, then you can only make it about 2 minutes faster--is it worth your time? If you're approaching this as a learning exercise, do it with a significant amount of data--something that will take 30 minutes.