SQL Server – Best Way to Configure and Maintain a Table with Many Insertions and Deletions

bulk-insertdeleteindex-maintenanceperformanceperformance-tuningsql server

There exists a "batch query" table that has two columns (BatchID, RecordID) which are both of TSQL type 'uniqueidentifier'. The unique clustering key is on (BatchID, RecordID), and there are no secondary indexes.

We use SqlBulkCopy to quickly insert thousands or millions of ids into the table under a single batch id, so that various queries can join against it to perform filtering using that set of record ids associated with that batch. That approach much faster than attempting multiple queries that all send lists of ids in the query string. As soon as the queries complete, the batch of record ids is deleted. The table is empty on average when no queries are running, but may have hundreds of active batches as we process thousands of API requests per second that may use the table.

My question is, what kind of maintenance steps, if any, are required of a DBA to maintain these tables. For example, would it benefit at all from periodic index rebuilding, or will SQL Server handle cleanup of the deleted rows (freeing page blocks, etc.) well on its own. Also, is there any special configuration of the table or clustered index that would help, especially considering that the identifiers are random Guids (uniqueidentifiers).

Generally we use the latest version of SQL Server, Enterprise Edition. Some servers are in Azure, some in AWS.

I'm also interested in hearing alternatives. For example, I assume this approach is best with a permanent table, but I could also create a session-local temp table as part of a transaction that gets deleted when finished, rather than inserting and deleting rows from a permanent table. I just don't know how that would perform in comparison.

Best Answer

In my experience with SQL Server 2016 Enterprise Edition in a similar process where a permanent table is used as a message queue - so it has a very high rate of inserts and deletes - we found rebuilding indexes every 5 to 10 minutes to be beneficial in forcing deleted record clean up. This dramatically improved read performance on the table.

In our case, the database isolation level is read committed snapshot, and the dequeue/delete process was deleting records faster than the ghost cleanup process was clearing them out. With high ghost record counts, selecting one record would sometimes take 1-3 seconds instead 5-10 ms because SQL was having to read past all the ghost records, indicated by a high Scan Count (with set statistics io on).

To check the number of ghost records:

SELECT version_ghost_record_count, * 
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('[transport].[BatchTable]'), NULL, NULL, 'SAMPLED')

https://docs.microsoft.com/en-us/sql/relational-databases/ghost-record-cleanup-process-guide?view=sql-server-2017#ghost-record-cleanup-task

The ghost cleanup process runs automatically on an interval (every 5 seconds for SQL Server 2012+, every 10 seconds for SQL Server 2008/2008R2) and checks to see if any pages have been marked with ghost records. If it finds any, then it goes and deletes the records that are marked for deletion, or ghosted, touching at most 10 pages with each execution.

On high-load systems with many deletes, the ghost cleanup process can cause a performance issue from keeping pages in the buffer pool and generating IO

I think the "at most 10 pages" part was our limiting factor as the high rate of deletes exceeded the amount of pages the cleanup process could touch each run.

Related Solutions

Sql-server – Breaking up long running query into batches runs fine for a while and then produces fatal error

Jon,

One possible option with the statement you posted would be to put an explicit begin transaction before the first delete statement and a commit statement right before the set of the @LargestKeyProcessed variable. This would commit the delete before you move to next loop iteration and should help with the log file as it's breaking the batch up into small transactions instead one one large one that is generated implicitly by the loop.

Additionally if your database is in FULL recovery mode you will need to take frequent backups of the transaction log to allow the transactions to be removed,once the have committed or rolled back, to prevent the log file from needing to grow. If your database is in SIMPLE recovery mode the transactions will get overwritten once the transaction commits or rolls back.

Mysql – Best practices, MySQL table with 150+ columns, many null

Consider an Entity-Attribute-Value design. The general concept is that you put all of the data in a very long, narrow table which may take the form of:

CREATE TABLE dbo.PropertyAttributes
(
PropertyID   INT NOT NULL
             REFERENCES dbo.Properties(PropertyID),
AttributeID  INT NOT NULL
             REFERENCES dbo.Attributes(AttributeID),
StringValue  NVARCHAR(1024),
NumericValue DECIMAL(16,4),
DateValue    SMALLDATETIME,
ModifiedDate SMALLDATETIME NOT NULL
             DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (PropertyID, AttributeID)
);

Then you store metadata about the attributes. At a minimum you'd need something like this:

CREATE TABLE dbo.Attributes
(
AttributeID INT PRIMARY KEY,
PropertyID  INT NOT NULL 
            REFERENCES dbo.Properties(PropertyID),
[Name]      NVARCHAR(32) NOT NULL UNIQUE,
DataTypeID  TINYINT NOT NULL -- 1 = string, 2 = numeric, 3 = date
);

Taking this approach offers some benefits because when new attributes show up in the data, you don't have to modify the database structure to house the data and you don't have to construct new queries to get at the new columns you created.

Like other aspects of life, these benefits come with tradeoffs. Creating a very wide view requires you to Pivot the table which can be an expensive query to run.

Aaron Bertrand, who posts here quite a bit, described his experience with this approach in 2009. It's still worth a read today.

MDCCL's suggestion to review attributes and potentially decompose the most relevant ones is a good one, move the most commonly used attributes into the base Properties table so you pivot the EAV table less frequently.

Best Answer

Related Solutions

Sql-server – Breaking up long running query into batches runs fine for a while and then produces fatal error

Mysql – Best practices, MySQL table with 150+ columns, many null

Related Question