Sql-server – How to create clustered index on 100 GB table

clustered-indexindexsql serversql-server-2008

I have a heap table that takes about 104 GB of disk space with almost 3 billion rows. I am trying to create a clustered index on this table on the [WeekEndingDate] column. I have about 200 gb's free in the data file and about 280 gb's free in the tempdb.

I have tried two different methods. First was to create the index directly on the table with the following command:

CREATE CLUSTERED INDEX CX_WT_FOLD_HISTORY
ON WT_FOLD_HISTORY (WeekEndingDate ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, 
IGNORE_DUP_KEY = OFF
, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, 
DATA_COMPRESSION = PAGE)

I tried it both with SORT_IN_TEMPDB = ON and OFF. When using ON it filled up the tempdb and with OFF it filled the data drive.

Other method was to create a new blank table with the needed index and then insert the records from the heap into the new table. This failed as well after filling up the data drive.

Any other suggestions on what to do. Most things I've read stated that I would need about 1.2 times the size of the table to be used as a workspace while create the index. I have way more than that and it still fails. Any suggestions would be appreciated.

Here is my original heap table structure:

CREATE TABLE [dbo].[WT_FOLD_HISTORY](
[WeekEndingDate] [varchar](50) NULL,
[Division] [varchar](50) NULL,
[Store] [varchar](50) NULL,
[SKUNumber] [varchar](50) NULL,
[UPC] [varchar](50) NULL,
[SalesUnits] [varchar](50) NULL,
[SalesCost] [varchar](50) NULL,
[SalesRetail] [varchar](50) NULL,
[InventoryUnits] [varchar](50) NULL,
[InventoryCost] [varchar](50) NULL,
[InventoryRetail] [varchar](50) NULL,
[OnOrderUnits] [varchar](50) NULL,
[OnOrderCost] [varchar](50) NULL,
[OnOrderRetail] [varchar](50) NULL,
[ReceiptUnits] [varchar](50) NULL,
[ReceiptCost] [varchar](50) NULL,
[ReceiptRetail] [varchar](50) NULL,
[PermanentMarkdowns] [varchar](50) NULL,
[ReturnsToVendor] [varchar](50) NULL,
[POSMarkdowns] [varchar](50) NULL,
[TimeFK] [smallint] NULL,
[LocationFK] [int] NULL,
[ItemFK] [int] NULL
) ON [AcademySports_DataFG1]

Best Answer

If you've got a short-term need for disk space, one option would be to:

Shrink tempdb temporarily, freeing up as much space on that drive as seems safe.
Create a secondary data file for the DB the table is in on the tempdb drive.
Add the clustered index to the table.
Shrink the secondary file by migrating all data out of it.
Remove the secondary file.
Make sure the tempdb file is allowed to grow to its former size.
Rebuild indexes in the table's DB (the removal of the secondary file will have caused some fragmentation).

NOTE: as others have suggested, I'd only do this after things like temporarily removing non-clustered indexes from the table in question. This in particular will allow the addition of the clustered index to go faster, as the non-clustered indexes would all have to be rebuilt anyway (with a clustered index in place, the index key is used to locate the rows in the table itself).

That's actually another point - how wide is the key on the clustered index? If you do have non-clustered indexes, and the key on the clustered index is significantly wider than the pointer into the heap was, then the non-clustered indexes will consume more space after the clustered index is created.

If the cluster key consists of several columns, or even one large column (say, a varchar column with an average length of 25 or more), you may want to consider a surrogate key instead (usually a monotonically increasing value, for best INSERT performance.

Related Solutions

Sql-server – PAD_INDEX and FILLFACTOR on clustered Identity index

PAD_INDEX only applies to non leave level pages. By default those are filled full.

So specifying PAD_INDEX=ON together with FILLFACTOR is only useful when you specify a fillfactor smaller then 100%.

In your case, specifying PAD_INDEX is not necessary, since by default your leave pages are how you want them. Full.

Sql-server – Differences Between Two Different Create Index Commands

It boils down to looking what the default values are. Lets break this down:

CREATE UNIQUE NONCLUSTERED INDEX [DEID_MAP_IDX1] ON [dbo].[DEID_MAP]

nonclustered is specified here. The default (i.e. nothing specified) is nonclustered. So unless clustered is specified it will default to nonclustered. So that's the same in both scripts.

[dbo] is specified here explicitly. As for the second un-specified CREATE INDEX then it all depends on what the current user's default schema is. Only you can answer that at the moment, so that may or may not default to dbo.

WITH (
    PAD_INDEX  = OFF, 
    STATISTICS_NORECOMPUTE  = OFF, 
    IGNORE_DUP_KEY  = OFF, 
    ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]

PAD_INDEX: the default is OFF, so unspecified will be the same in the second script as it is in the first.

STATISTICS_NORECOMPUTE: the default is OFF, so the second script unspecified has the same value.

IGNORE_DUP_KEY: the default is OFF, so the second CREATE INDEX is identical with this parameter.

ALLOW_ROW_LOCKS: the default is ON, so the second CREATE script has the same behavior.

ALLOW_PAGE_LOCKS: the default is ON...the second script has identical behavior.

... ON [PRIMARY]: just like the default schema one, this all depends on what your default filegroup is. If PRIMARY is the default filegroup, your second CREATE INDEX script will also create the index on PRIMARY. If PRIMARY is not the default filegroup, then it will be a different filegroup, as an unspecified filegroup will go to the default filegroup.

All of this information and default values can be found on this BOL reference here.

Best Answer

Related Solutions

Sql-server – PAD_INDEX and FILLFACTOR on clustered Identity index

Sql-server – Differences Between Two Different Create Index Commands

Related Question