Sql-server – SQL Concurrent Inserts and Automatically Find Active Flag Record

concurrencydatabase-designinsertoptimizationsql server

What is the proper database design to Instantaneously find Active Flag for an OLTP table: with many rows being inserted into a parallel, multithreading environment while allowing optimization, and prevent racing conditions, locking, deadlocks, etc?
Note: This table only has inserts, no deletes or updates.

The active flag is determined by shops sending order files ( max(LastModified) column). With customer modifications and technical issues, sometimes shops can send old files later. So last file being sent may not always be current and have an earlier [LastModified] date.

Should we use :

(1) One table design with active flag,

(2) Or Two table design (transaction table and a current pointer table) in high insert environment?

We receive around 5000 inserts/sec and 100 million inserts in week time span. Consumes around 200GB of data, SQL2016 in 50 Cores, 200GB of RAM.

Method 1: One Table Method with Current Flag

create table dbo.CustomerOrder
(
    CustomerOrderId bigint primary key identity(1,1),
    CustomerId int,
    CurrentFlag bit,
    FileLastModified datetime,  -- actual shop submittal date
    Productsku varchar(25),  
    Quantity int,
)

Method 2: Two Table Method, Pointer Rows to Original Table

create table dbo.CustomerOrderCurrent
(
    CustomerOrderCurrentKey bigint primary key identity(1,1),
    CustomerId int,
    CustomerOrderId bigint foreign key references  CustomerOrder(CustomerOrderId) -- refers to historical table
)

Best Answer

Based on the question 'What is the proper database design':

Having a 'CurrentFlag' column 'de-normalises' the schema design a little, for the sake of identifying the current order for a customer quickly.

I've worked on databases that use this method, but it is a performance trade-off that requires 'code' (e.g. triggers) to maintain the column. This is where concurrency issues can creep in because of the extra work being done upon insert.

However, the same result can be achieved without changing the schema, by adding a descending order covering index which 'includes' the 'CustomerOrderID' in the index itself.

Testing is required for this index-only solution to ensure it is performant under the '100 million inserts in week time span' load, especially when querying the data (see end of the code sample).

Covering index example code:

--Create a test table for the example based on given schema
USE [testDB]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[CustomerOrder](
    [CustomerOrderId] [bigint] IDENTITY(1,1) NOT NULL,
    [CustomerId] [int] NOT NULL,
    [FileLastModified] [datetime] NOT NULL,
    [Productsku] [varchar](25) NOT NULL,
    [Quantity] [int] NOT NULL,
 CONSTRAINT [PK_CustomerOrder] PRIMARY KEY CLUSTERED 
(
    [CustomerOrderId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

-- Create the covering index
CREATE NONCLUSTERED INDEX IX_ActiveCustomerOrder 
    ON dbo.CustomerOrder (CustomerID ASC, FileLastModified DESC) INCLUDE (CustomerOrderID);

-- Insert dummy data (with dates out of order)
INSERT INTO [dbo].[CustomerOrder] ([CustomerId] ,[FileLastModified] ,[Productsku] ,[Quantity]) 
       VALUES (100, '17-Oct-2017', 'Prod1',20);
INSERT INTO [dbo].[CustomerOrder] ([CustomerId] ,[FileLastModified] ,[Productsku] ,[Quantity]) 
       VALUES (100, '15-Oct-2017', 'Prod1',30);
INSERT INTO [dbo].[CustomerOrder] ([CustomerId] ,[FileLastModified] ,[Productsku] ,[Quantity]) 
       VALUES (100, '18-Oct-2017', 'Prod1',41);
INSERT INTO [dbo].[CustomerOrder] ([CustomerId] ,[FileLastModified] ,[Productsku] ,[Quantity]) 
       VALUES (100, '14-Oct-2017', 'Prod1',42);
INSERT INTO [dbo].[CustomerOrder] ([CustomerId] ,[FileLastModified] ,[Productsku] ,[Quantity]) 
       VALUES (100, '13-Oct-2017', 'Prod1',43);


-- Fetch the active order for a single customer using an index hint
SELECT TOP(1) CustomerID, CustomerOrderId, [FileLastModified] 
FROM dbo.CustomerOrder WITH (INDEX=IX_ActiveCustomerOrder)
    WHERE CustomerID = 100; 

-- Fetch active orders for all customers
SELECT x.CustomerID, x.CustomerOrderId, x.[FileLastModified]
FROM (
SELECT CustomerID, 
       CustomerOrderId, 
       FileLastModified, 
      (ROW_NUMBER() OVER(PARTITION BY CustomerID 
                    ORDER BY FileLastModified DESC)) AS RowID 
  FROM dbo.CustomerOrder WITH (INDEX=IX_ActiveCustomerOrder)
) x
WHERE x.rowid = 1;