Sql-server – Clustered Index taking huge amount of space

clustered-indexsql server

I have been working on indexing a database dealing with historical data and recently encountered an issue with a clustered index that surprised me.

The database is SQL Server 2018. The table itself deals with account information and is essentially a fact table, it is populated with around 200k records a day and has 200 columns. Currently it is sitting at 68mil records, and before the index it took up around 120GB of space.

Because it is a historical table on account information, and we do not have UUID's set up (which would be a massive effort my boss told me not to spend my time on with this database), I created the primary key + clustered index on the Date field and Account Number with a statement equivalent to:

ALTER TABLE AcctHistory 
ADD CONSTRAINT PK_AcctHistory PRIMARY KEY CLUSTERED (Date, Account_Number)

The Date field is a Datetime field (This is a DEV environment that I didn't set up but am now building out. Just know I plan on updating it to a date soon, but I am providing the type in case that is relevant.), the Account_Number field is a bigint.

After running this statement, according to the SSMS disk space by table report, the amount of space the table consumed shot up to 480GB, 4x what it took before the index.

I know indexes take up space, but I have never heard of anything like this. Is this normal or is this indicative of a problem I can solve to save space?

Best Answer

Changing a table from a heap to a clustered table doesn't "grow the data". I.e., the clustered index doesn't magically require more space just because it is organized as a linked list (sorted by some column combination) compared to a heap.

However, things to watch out for:

Internal fragmentation (less than 100% full pages). If you had a low value for fillfactor configured for the instance (sp_configure) or specified a low value in the CREATE INDEX/ALTER TABLE command, then that can explain it. So, check your page fullness using sys_dm_db_index_physical_stats().

If you are comparing apples to oranges. Are you looking at only the clustered index or at the table level (i.e., including all the other indexes). The other indexes (non-clustered, NC) have as a "row-pointer" the RID for a heap, which is 8 byte. The row-pointer from an NC index for a clustered table is the clustering key. In your case it is 16 bytes (it would have been 4 extra bytes for duplicates if it wasn't unique). I.e., the NC indexes will now be a bit bigger than before. Also, then converting the table from a heap to a clustered index, all NC indexes were rebuilt, and I think the fillfactor will be picked up from the server setting. Could that explain it? I very much doubt it, but perhaps in some extreme case... Assuming you include the NC index size when you do the comparison. So, make sure that you are looking at the right thing (I prefer my sp_indexinfo, which gives pretty details output, but anything that you know an trust).

My bet is on an sp_configure for fillfactor which is weirdly low...