We have a SQL Server solution that has a table dsStaging.Audit
that stores audit records created by a third party transactional database. We use these audits to synchronise CRUD operations from the third party system into our SQL database.
CREATE TABLE [dsStaging].[Audit](
[SyncExecutionId] [bigint] NOT NULL,
[AuditDataGuid] [nvarchar](56) NOT NULL,
[AuditDate] [datetime] NOT NULL,
[AuditDateTimeZone] [datetimeoffset](3) NULL,
[AuditEventGroup] [nvarchar](56) NOT NULL,
[TransactionId] [bigint] NOT NULL,
[TransactionSequence] [int] NOT NULL,
.
...
.
CONSTRAINT [PK_Audit] PRIMARY KEY CLUSTERED
(
[SyncExecutionId] ASC,
[TransactionId] ASC,
[TransactionSequence] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
When the audits are processed, I want to move the audit records into a separate table Processed.Audit
, ready for deletion after x days.
CREATE TABLE [Processed].[Audit](
[SyncExecutionId] [bigint] NOT NULL,
[AuditDataGuid] [nvarchar](56) NOT NULL,
[AuditDate] [datetime] NOT NULL,
[AuditEventGroup] [nvarchar](56) NOT NULL,
[TransactionId] [bigint] NOT NULL,
[TransactionSequence] [int] NOT NULL,
.
...
.
CONSTRAINT [PK_Processed_Audit] PRIMARY KEY NONCLUSTERED
(
[AuditDate] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
My main aim in moving audits out of staging and into processed is performance. I need to make sure that the staging table is locked for as short a time as possible, so that any unprocessed audits can be processed as quickly as possible (fewer audits in the staging table = much faster processing).
We're looking at around 1.5m audit records going through this process every hour in batches of about 10k.
The process for moving audits will fire roughly every 20-30 seconds. The process for deleting Processed.Audit
records will execute every hour and delete 1 hour's worth of audits from X days ago (typically around 7 days).
- Should I turn the
Processed.Audit
table into a clustered index?
Minimum version to be supported: SQL Server 2012 Standard Edition.
Best Answer
The main reason I would want a clustered index in this scenario is this line:
When you delete rows from a HEAP, data pages may not be deallocated unless the delete gets a table lock, or you provide a
WITH (TABLOCK)
hint to the delete query. You can probably imagine what that does to concurrency, though. Not good.Note that the
TABLOCK
hint will not have this behavior if you're using RCSI or Snapshot Isolation.Here's a quick example. Load up a small table:
Run a sanity check query to figure out how many pages are assigned to the heap, and to the nonclustered PK:
Results in this:
So, 74 pages in the heap, 7 pages in the NC PK.
Do some singleton deletes to clear out the table:
If you re-run the sanity check query, you'll get the same result.
Worse, if you query the table now, SQL will read ALL OF THOSE BLANK PAGES!
So now not only is our table artificially large, but SQL now has a bunch of blank pages on disk and in memory and in backups and in DBCC CHECKDB and... well, you get the point.
Heh heh heh! No fun.
Other options for getting pages deallocated from the heap are:
Which doesn't work for you, because you need to batch delete data.
Which would be painful for you at that table size, because it will rebuild all nonclustered indexes on the table at the same time.
Will the table re-use pages? Sometimes maybe sorta kinda.
Sanity check:
SELECT * query:
Hope this helps!