The clustered index "is" the table. Rebuilding the index will require enough space to duplicate each existing non-deleted row, page-by-page. Since you don't have Enterprise Edition, you do not have the luxury of rebuilding the index online. This means the table will not be accessible during the rebuild operation.
You'd likely be better off rebuilding by creating a duplicate empty table on a different drive that has plenty of free space with room for foreseeable growth, moving data row-by-row in key order, then renaming the old and new tables. For instance if the original table is Table1 and the new copy is Table2, rename Table1 as Table1-old then rename Table2 as Table1.
Achieving optimal import performance in this scenario requires three things:
- Minimally-logged base table inserts
- Minimally-logged nonclustered index builds
- Avoiding physical reads
Minimal Logging
Achieving minimally-logged inserts to an empty clustered table without nonclustered indexes requires:
- Using either the
SIMPLE
or BULK_LOGGED
database recovery models
- Specifying a table lock and ordered input (e.g.
TABLOCK
and ORDER
hints)
Side note:
It is also possible to achieve minimally-logged inserts to a clustered
table that has nonclustered indexes provided trace flag 610 is
enabled. Whether nonclustered index inserts are minimally-logged or
not depends on the query plan selected by the query optimizer.
If the query plan uses a separate iterator for the nonclustered index,
and the iterator has the DMLRequestSort
property set to true
, the nonclustered index inserts will be minimally logged, provided the
other conditions mentioned previously are met.
Building nonclustered indexes separately
The advantages of doing this are:
- Clustered index inserts can be minimally-logged without enabling TF 610
CREATE INDEX
is minimally logged if the recovery model is not FULL
Avoiding physical reads
Ideally, the data to be imported will be stored on a separate machine, or at least on separate physical storage from that used to host the database.
The database server should have enough memory to hold the largest base table in cache, with enough left over for sorting operations necessary when building nonclustered indexes.
A good pattern is to fast-load the base table (minimally-logged clustered index load) and then to build all nonclustered indexes for that table while its data pages are still cached.
The question outlines a process whereby base tables are loaded first, and then nonclustered indexes built. The cursor definition does not use an ORDER BY
clause to at least group nonclustered index builds on the same table together.
The likely result is that data pages for different tables are repeatedly read into cache and then discarded as nonclustered indexes are built in a non-deterministic order.
The cost of repeated physical reads completely dominates the benefits of minimal logging gained by building nonclustered indexes separately. This explains why you found that loading tables with existing indexes is faster (because all nonclustered indexes for a given table are maintained before moving on to the next table).
Summary
The import process should be reworked to bulk-load one table at a time. This means loading the table and building all nonclustered indexes before moving on to the next one. The SQL Server instance should have enough memory available to hold the largest table and perform the largest nonclustered index sort at the same time.
You could also try enabling TF 610 before loading the data into tables with nonclustered indexes already in place. This is not usually as fast as the previous method, but it may be fast enough.
See the following for more information:
The Data Loading Performance Guide
Operations That Can Be Minimally Logged
Best Answer
If you are expecting a rebuild to make the data files in the file system smaller, this is not how it works.
In fact, rebuilding can even lead to larger data files on disk, as new pages and extents are allocated to hold the copy of the data. When the old pages are dropped from the index, they're not actually removed from anything, they're just deallocated.
You've rebuilt your indexes and saved some space inside the data file. This means that as you add more data (to this and other tables), the data file doesn't immediately have to grow to make room for the new data.
This is a good thing. Growing a file is an expensive, blocking operation, and you want to minimize this. Shrinking the file to get back some disk space temporarily seems like a lost cause. The data file is just going to grow again later, in which case, what are you going to do with that disk space you got back for a little while? Lease it out to someone and then evict them when your database has to grow again? Just leave it at the size it has grown to now, because unless you mark it as read-only, it will more than likely grow to that size again.
The data file is just a container. Left alone, it will only ever grow. It won't shrink just because you've deleted some rows or rebuilt some indexes or dropped a table. SQL Server doesn't release that disk space and shrink the file for you for the same time it won't release memory back to the operating system just because you've dropped an index that happened to be loaded into the buffer pool: it assumes you're going to use that space / memory again, and knows that it's expensive to keep releasing it and re-claiming it. So it just holds onto it.
And, in almost all cases, you're going to want it that way. If you really need to reclaim some space that you know you won't use again, see this post.