SQL Server – Best Way to Configure and Maintain a Table with Many Insertions and Deletions

bulk-insertdeleteindex-maintenanceperformanceperformance-tuningsql server

There exists a "batch query" table that has two columns (BatchID, RecordID) which are both of TSQL type 'uniqueidentifier'. The unique clustering key is on (BatchID, RecordID), and there are no secondary indexes.

We use SqlBulkCopy to quickly insert thousands or millions of ids into the table under a single batch id, so that various queries can join against it to perform filtering using that set of record ids associated with that batch. That approach much faster than attempting multiple queries that all send lists of ids in the query string. As soon as the queries complete, the batch of record ids is deleted. The table is empty on average when no queries are running, but may have hundreds of active batches as we process thousands of API requests per second that may use the table.

My question is, what kind of maintenance steps, if any, are required of a DBA to maintain these tables. For example, would it benefit at all from periodic index rebuilding, or will SQL Server handle cleanup of the deleted rows (freeing page blocks, etc.) well on its own. Also, is there any special configuration of the table or clustered index that would help, especially considering that the identifiers are random Guids (uniqueidentifiers).

Generally we use the latest version of SQL Server, Enterprise Edition. Some servers are in Azure, some in AWS.

I'm also interested in hearing alternatives. For example, I assume this approach is best with a permanent table, but I could also create a session-local temp table as part of a transaction that gets deleted when finished, rather than inserting and deleting rows from a permanent table. I just don't know how that would perform in comparison.

Best Answer

In my experience with SQL Server 2016 Enterprise Edition in a similar process where a permanent table is used as a message queue - so it has a very high rate of inserts and deletes - we found rebuilding indexes every 5 to 10 minutes to be beneficial in forcing deleted record clean up. This dramatically improved read performance on the table.

In our case, the database isolation level is read committed snapshot, and the dequeue/delete process was deleting records faster than the ghost cleanup process was clearing them out. With high ghost record counts, selecting one record would sometimes take 1-3 seconds instead 5-10 ms because SQL was having to read past all the ghost records, indicated by a high Scan Count (with set statistics io on).

To check the number of ghost records:

SELECT version_ghost_record_count, * 
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('[transport].[BatchTable]'), NULL, NULL, 'SAMPLED')

https://docs.microsoft.com/en-us/sql/relational-databases/ghost-record-cleanup-process-guide?view=sql-server-2017#ghost-record-cleanup-task

The ghost cleanup process runs automatically on an interval (every 5 seconds for SQL Server 2012+, every 10 seconds for SQL Server 2008/2008R2) and checks to see if any pages have been marked with ghost records. If it finds any, then it goes and deletes the records that are marked for deletion, or ghosted, touching at most 10 pages with each execution.

On high-load systems with many deletes, the ghost cleanup process can cause a performance issue from keeping pages in the buffer pool and generating IO

I think the "at most 10 pages" part was our limiting factor as the high rate of deletes exceeded the amount of pages the cleanup process could touch each run.