Sql-server – Clustered Index during data load in MS SQL Server

performancesql serverssms

Until today, I was under impression that it's been considered as a best practice to disable the indexes in sql tables during the data load. And with the fact that one cannot disable the clustered indexes in MS Sql server, you would have to delete that and then perform the data load operation. But today, I performed the data load operation of around 30M records with 10-15 columns of different column types and I was surprised to see the result where keeping the clustered index out performs the load compared to dropping the clustered index, loading data and then creating the clustered index.

In my test, the data which was being inserted was presorted in the source table and that might be one reason for the difference, but I just want to understand is it still considered as a best practice to drop the clustered index prior to data load and recreate it after or keeping the clustered index is the optimal way?

Best Answer

It's important to note that a clustered index isn't really just an index, it is also the physical structure/ordering of the pages that comprise the table, not just the B-Tree of the primary key.

This is one of the areas why Sybase/SQL Server sort of dropped the ball in naming, while other vendors (IBM/Oracle) did a better job referring to them as "Index-organized tables" which IMO illustrates the concept more clearly.

So if you drop the clustered index, the table turns into a heap. When you re-enable it, the table has to be physically rebuilt, which requires sorting data and moving pages - both expensive operations.