Sql-server – SQL Server Tuning a table for massive inserts

sql serversql-server-2008

I'm trying to solve a performance issue and I need advice.

I have a table, which about twice a day has a massive surge of inserts coming in, and exactly at the same time everybody is trying to read from it. So I have locks and everything becomes painfully slow.

My performance requirements are as follows:

  1. many writers, from different sources, and i prefer each write to be atomic.
  2. i don't care about the order of the writes, number 7 can come in before number 3.
  3. writes should end ASAP, I want writers to wait the least possible amount of time.
  4. data is never updated, only inserted.
  5. I have many many readers.
  6. readers want to see the most updated snapshot possible at the time of their read.
  7. I can live with readers seeing delayed data (10-20 seconds)
  8. I need occasionally do complex and large queries on this data, so i need several indexes on it.

How would you approach this, and how would you implement this with SQL Server?


I'm thinking of splitting it into 2 tables… one for the inserts, and one for reading, and a worker moving data.

the large queries will run only on the second, the readers that need updated data, can run on the second, and complete what they need from the first.

Best Answer

One solution I used was to thread out writes to multiple staging tables, then collect the data in batches. We put the staging tables on their own filegroups to try and segregate the I/O as much as possible, though we did end up with two tables on each available LUN, as opposed to each table getting its own dedicated I/O.

Since you said order of writes doesn't matter, this may be viable, but I don't know if 20 seconds will be sufficient to make this work and still not encounter a bunch of blocking.

So let's say we have a base table like this, which is where everyone reads from (and is currently writing to):

CREATE TABLE dbo.StockData
(
  StockID INT,
  ...other info...
);

Then we have staging tables like this, where we're going to write instead:

CREATE TABLE staging.StockData0
(
  StockID INT,
  ...other info...
);

CREATE TABLE staging.StockData1
(
  StockID INT,
  ...other info...
);

... up to StockData9

*Don't bother mimicking all of the non-clustered indexes, constraints etc. on these staging tables,

Then assuming you have singleton inserts via a stored procedure, change the stored procedure to use dynamic SQL and insert into one of the 10 tables based on the StockID value:

DECLARE @sql NVARCHAR(MAX) = N'INSERT staging.StockData'
  + CONVERT(CHAR(1), @StockID % 10 + '(StockID, cols...)
    SELECT @StockID, @params...;';

EXEC sp_executesql @sql, N'@StockID INT, @params...', @StockID, @params...;

Then have a background job that performs a very quick, metadata operation moving these tables into a dummy schema - this minimizes the amount of time that is spent blocking users' ability to write while you move this data into the real table (which you can now control much better than sporadic writes from all over the place). So first create a second copy of the tables in a different schema:

CREATE SCHEMA shadow  AUTHORIZATION dbo;
CREATE SCHEMA holding AUTHORIZATION dbo;
GO

CREATE TABLE shadow.StockData0
(
  StockID INT,
  ...other info...
);

-- repeat for 1-9

Now the job would do this:

TRUNCATE TABLE shadow.StockData0;

BEGIN TRANSACTION;

  ALTER SCHEMA holding TRANSFER staging.StockData0;
  ALTER SCHEMA staging TRANSFER shadow.StockData0;

COMMIT TRANSACTION;

ALTER SCHEMA shadow TRANSFER holding.StockData0;

-- repeat for 1-9 (I actually used dynamic SQL in a loop
-- so I wouldn't have to repeat myself ten times...

I blogged about this part of it here and here.

The job would then go on to insert batches into the primary table, with a delay in between, to allow some reads to happen.

BEGIN TRANSACTION;

  INSERT dbo.StockData(StockID, ...cols...)
    SELECT StockID, ...cols...
    FROM holding.StockData0;

COMMIT TRANSACTION;

WAITFOR DELAY '00:00:01';

-- repeat for 1-9

This was several years ago, and something that didn't occur to me at the time (which I remember now, too, after seeing the other link posted by wBob) is that using TABLOCK inside those transactions may help speed up the inserts (though you'll want to test based on volume, isolation level, etc).

It might also be useful (even with all this work) to enable Read Committed Snapshot Isolation and/or if you are on Enterprise Edition considering partitioning (and aligning the staging tables to those partitions).