What is the proper database design to Instantaneously find Active Flag for an OLTP table: with many rows being inserted into a parallel, multithreading environment while allowing optimization, and prevent racing conditions, locking, deadlocks, etc?
Note: This table only has inserts, no deletes or updates.
The active flag is determined by shops sending order files ( max(LastModified) column). With customer modifications and technical issues, sometimes shops can send old files later. So last file being sent may not always be current and have an earlier [LastModified] date.
Should we use :
(1) One table design with active flag,
(2) Or Two table design (transaction table and a current pointer table) in high insert environment?
We receive around 5000 inserts/sec and 100 million inserts in week time span. Consumes around 200GB of data, SQL2016 in 50 Cores, 200GB of RAM.
Method 1: One Table Method with Current Flag
create table dbo.CustomerOrder
(
CustomerOrderId bigint primary key identity(1,1),
CustomerId int,
CurrentFlag bit,
FileLastModified datetime, -- actual shop submittal date
Productsku varchar(25),
Quantity int,
)
Method 2: Two Table Method, Pointer Rows to Original Table
create table dbo.CustomerOrderCurrent
(
CustomerOrderCurrentKey bigint primary key identity(1,1),
CustomerId int,
CustomerOrderId bigint foreign key references CustomerOrder(CustomerOrderId) -- refers to historical table
)
Best Answer
Based on the question 'What is the proper database design':
Having a 'CurrentFlag' column 'de-normalises' the schema design a little, for the sake of identifying the current order for a customer quickly.
I've worked on databases that use this method, but it is a performance trade-off that requires 'code' (e.g. triggers) to maintain the column. This is where concurrency issues can creep in because of the extra work being done upon insert.
However, the same result can be achieved without changing the schema, by adding a descending order covering index which 'includes' the 'CustomerOrderID' in the index itself.
Testing is required for this index-only solution to ensure it is performant under the '100 million inserts in week time span' load, especially when querying the data (see end of the code sample).
Covering index example code: