Sql-server – How much disk space do I need to add a primary key to a very large table

sql serversql server 2014

I'm using Microsoft SQL Server 2014 – 12.0.4100.1 Enterprise Edition (64-bit) on Windows NT 6.3 (Build 9600: )

I have a 491 GB table with 3.6 billion rows that does not have a primary key. The primary key I wish to add will only cover a single bigint column which contains unique values. The database has 477 GB of free space, but that's not enough it seems.

Trying to add it fails with the following error:

Could not allocate space for object 'dbo.SORT temporary run storage:
424251614560256' in database '[Redacted]' because the 'PRIMARY'
filegroup is full. Create disk space by deleting unneeded files,
dropping objects in the filegroup, adding additional files to the
filegroup, or setting autogrowth on for existing files in the
filegroup.

The database name in the error is the DB to which I'm trying to add the PK, not 'tempdb', which is what I often see in this sort of error.

The command I'm using:

ALTER TABLE REDACTED ADD CONSTRAINT [PK_REDACTED] PRIMARY KEY CLUSTERED 
(
    [RedactedId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

At first, the command was failing after 8-9 minutes, then my DBA added another data file to the DB on a drive with about 150 GB of free space. Running it a second time failed after 24 minutes. That's progress, but I have no idea how much free space is needed, or how long it will actually take to add this index. There is currently 477 GB free in this database, but that's still not enough to run this sort it seems.

Is there any way to calculate how much space is needed to run this command on 3.6 billion bigint values? Or is there some other trick that will help this command succeed? What does that large number mean in the error?

Best Answer

I would also add that not only data pages will be moved when creating a clustered index - also all the non-clustered indexes will have to be rebuilt since each row in non-clustered index contains clustered key value. All these data movements need to be logged and sorting process will heavily use tempDb since you probably don't have 500 GBs of memory available, right? ;)

You'll find plenty of debates and different opinions about using heaps and I'm of course speculating here but most likely there's a reason why such a big table doesn't already have a clustered index defined on top of it. It's not exactly the best practise but sometimes it does make sense to define a table without clustered index (quick, minimally logged inserts) with a few narrow non-clustered indexes that cover frequent queries.

Decision about adding any kind of index (primary, clustered, non-clustered, XML, spatial, ...) depends of course on the problem you're trying to solve. In OLTP system you typically optimise your indexes to improve write performance while in reporting systems like OLAP you usually need to work on read types of workloads.

And few suggestions to wrap this up: - primary key doesn't have to be clustered - if the table is not partitioned you should definitely consider partition it - if your goal is to make certain types of queries faster (and not trying to enforce uniqueness of your bigint column) you should consider creating additional (or modifying existing) non-clustered (possibly filtered) index or indexed view (perfect for aggregate queries)

Hope it helps, good luck!