Sql-server – How to create clustered index on 100 GB table

clustered-indexindexsql serversql-server-2008

I have a heap table that takes about 104 GB of disk space with almost 3 billion rows. I am trying to create a clustered index on this table on the [WeekEndingDate] column. I have about 200 gb's free in the data file and about 280 gb's free in the tempdb.

I have tried two different methods. First was to create the index directly on the table with the following command:

CREATE CLUSTERED INDEX CX_WT_FOLD_HISTORY
ON WT_FOLD_HISTORY (WeekEndingDate ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, 
IGNORE_DUP_KEY = OFF
, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, 
DATA_COMPRESSION = PAGE)

I tried it both with SORT_IN_TEMPDB = ON and OFF. When using ON it filled up the tempdb and with OFF it filled the data drive.

Other method was to create a new blank table with the needed index and then insert the records from the heap into the new table. This failed as well after filling up the data drive.

Any other suggestions on what to do. Most things I've read stated that I would need about 1.2 times the size of the table to be used as a workspace while create the index. I have way more than that and it still fails. Any suggestions would be appreciated.

Here is my original heap table structure:

CREATE TABLE [dbo].[WT_FOLD_HISTORY](
[WeekEndingDate] [varchar](50) NULL,
[Division] [varchar](50) NULL,
[Store] [varchar](50) NULL,
[SKUNumber] [varchar](50) NULL,
[UPC] [varchar](50) NULL,
[SalesUnits] [varchar](50) NULL,
[SalesCost] [varchar](50) NULL,
[SalesRetail] [varchar](50) NULL,
[InventoryUnits] [varchar](50) NULL,
[InventoryCost] [varchar](50) NULL,
[InventoryRetail] [varchar](50) NULL,
[OnOrderUnits] [varchar](50) NULL,
[OnOrderCost] [varchar](50) NULL,
[OnOrderRetail] [varchar](50) NULL,
[ReceiptUnits] [varchar](50) NULL,
[ReceiptCost] [varchar](50) NULL,
[ReceiptRetail] [varchar](50) NULL,
[PermanentMarkdowns] [varchar](50) NULL,
[ReturnsToVendor] [varchar](50) NULL,
[POSMarkdowns] [varchar](50) NULL,
[TimeFK] [smallint] NULL,
[LocationFK] [int] NULL,
[ItemFK] [int] NULL
) ON [AcademySports_DataFG1]

Best Answer

If you've got a short-term need for disk space, one option would be to:

  1. Shrink tempdb temporarily, freeing up as much space on that drive as seems safe.
  2. Create a secondary data file for the DB the table is in on the tempdb drive.
  3. Add the clustered index to the table.
  4. Shrink the secondary file by migrating all data out of it.
  5. Remove the secondary file.
  6. Make sure the tempdb file is allowed to grow to its former size.
  7. Rebuild indexes in the table's DB (the removal of the secondary file will have caused some fragmentation).

NOTE: as others have suggested, I'd only do this after things like temporarily removing non-clustered indexes from the table in question. This in particular will allow the addition of the clustered index to go faster, as the non-clustered indexes would all have to be rebuilt anyway (with a clustered index in place, the index key is used to locate the rows in the table itself).

That's actually another point - how wide is the key on the clustered index? If you do have non-clustered indexes, and the key on the clustered index is significantly wider than the pointer into the heap was, then the non-clustered indexes will consume more space after the clustered index is created.

If the cluster key consists of several columns, or even one large column (say, a varchar column with an average length of 25 or more), you may want to consider a surrogate key instead (usually a monotonically increasing value, for best INSERT performance.