T-sql – Architecture of table for lot of inserts per second in SQL Server 2019

availability-groupsfilegroupspartitioningt-sql

We are in process of development of project which will process around 51 000 000 records daily (~around 500-600 inserts per second). We choose Always On solution which will contain:

One Primary Server
Three replicas which has to work in synchronous mode

Main table of this project will contain 45 columns with several data types. Like I wrote above to this table we will have to insert around 500-600 rows per second (idenepndly, I mean that will be procedure insertMainTable which will insert one row to this table will be executed 500-600 times per second).

According to this efficiency we decide to create this table as partition table. Our idea is to create 32 partitions which base on smallint column in this table. Due to this solution we have idea to create 16 file groups with 32 files (so 1 file group will contain 2 files). So every partition will be in different file. What is more every 2 files will be on different filegroup. I was testing this solution and have results below:

https://stackoverflow.com/questions/60339635/sql-server-2019-always-on-performance

My question is:

Do you think that this kind of solution is enough/to complicated comparing to goals? What would you recommend?

Best Answer

I have some experience with high throughput tables, partitioning and Availability Groups, so I feel confident enough to post this as an answer, although the best I can really do is raise points for you to consider.

First - your 3 synchronous replicas are going to be the slowest links in your chain. Every commit is going to have to wait for all three to confirm before the client receives the commit signal. I don't know what to tell you here but make sure you are on fast network and fast disks.

Second - I don't think you need the multiple files to make your solution fast and it may slow things down. If you move items between partitions then if they are in the same filegroup that's fairly fast and meta-only. However, if each partition has it's own file/filegroup and you move things around, the system has to pick that data up and physically move it instead of just updating an index.

Third - What is the goal/point of your partitioning? You should be able to run many hundreds of inserts to a single table. And SQL 2019 has improvements for highly concurrent inserts to avoid the hot spot issue. Have you tested it without all the partitioning overhead?

Are you going to be updating this data? Are you going to be purging the data later? If you are purging the data later then http://kejser.org/table-pattern-rotating-log-ring-buffer/ may be of use to you. I've implemented that pattern in my production environment to great success.

What are the datatypes for this 45 column table? If you are using VARCHAR(MAX), XML or other large datatypes then there may not be much that can be done without re-architecting the table.

Partitioning isn't really great at speeding table access up. It's main benefit is in allowing for rapid maintenance activities. Such as quickly loading new data, or discarding old data.

Related Solutions

Sql-server – Strategy to deal with multi billion row simulation results and partitioning

We were doing something not unlike this with clickstream data. We unplugged SQLServer and plugged in Vertica (a columnar analytics ANSI-SQL compliant RDBMS)...we've never looked back. Multi-minute queries dropped to millsecond queries, and data loads dropped from hours to seconds. If/when you start outgrowing it, add more nodes and rebalance online. Very nicely done product.

The Community Edition is free for 3 nodes and 1Tb of data (which gets compressed a LOT more than you would expect, so 1Tb is quite a lot of data), and the commercial version (unlimited nodes) is about half the cost of SQLServer EE for TB-size data (it's licensed by data size). You might want to give it a look. ;-)

Btw, I have no affiliation with Vertica or HP, I'm just a very pleased customer/data architect.

Cheers, Dave Sisk

Sql-server – Dynamic Table partitioning on a daily basis

From SQL Server 2008 SP2 and SQL Server 2008 R2 SP1 there is support for 15,000 partitions so to be honest, you don't really need to do that much dynamically. Instead of having a complex daily process (Dynamically add filegroup and boundaries) with an opportunity to fail, simply create the partitions up-front from now to the year 2020, and you're well within your limits and pretty future-proof.

You could assign all partitions to one filegroup (ok not necessarily a great pattern), or round-robin between a limited number. To put it another way, there is no technical need to have a filegroup per day, eg

-- Assign all to one filegroup; ok not necessarily great
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1] )

-- Or round-robin
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1], [FG2], [FG3], [FG4], [FG5], [FG6], [FG7], [FG1] ... etc )

Obviously use Excel or some tool to generate the scripts for you - no need to type them out : )

Use DMV sys.partition_range_values and metadata function $PARTITION to work out information about what data is where. Create a daily job to switch out and truncate your oldest partition. I would regard this as lower risk than the daily add.

Warning!! Carefully read the whitepaper as this needs to be enabled and there are some issues with this approach ( eg Creating and rebuilding nonaligned indexes on a table with more than 1,000 partitions is not supported ). If you're feeling risk averse the standard limit of 1,000 partitions would still allow you to pre-allocate just under 3 years.

As you really want to partition by DATE not DATETIME2, consider a computed column. I would probably want to performance test this first though.

There is also a tool on codeplex (SQL Server Partition Management) which might be worth a look although I haven't used it.

To answer your other questions, there should only be one log file for the database. Add other files as .ndf not .mdf. To use the same partition scheme (not function) on the same tables, simply create them with the same scheme, they will divide the data up into the files beneath the filegroups eg

CREATE TABLE dbo.yourTable (    ...

    CONSTRAINT PK_yourTable PRIMARY KEY ( rowId, someDate )

    ) ON ps_test(someDate)

OK, this will make for a long answer but I've knocked up a demo of how something like this might work. It does remind me that the great thing about partition switching is that as a metadata-only operation it's instant. Just to be clear, this is a demo to show off principles and some example "how-to" code, it is not production quality. Work through it and make sure you understand it before running in a dev or test environment. You'll need about 200MB space.

------------------------------------------------------------------------------------------------
-- Setup START
-- Demo runs on my laptop in < 1 minute (ok on SSD)
-- You'll need 200MB space
------------------------------------------------------------------------------------------------

USE master
GO

IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    DROP DATABASE tooManyPartitionsTest 
GO
CREATE DATABASE tooManyPartitionsTest
GO

ALTER DATABASE tooManyPartitionsTest SET RECOVERY SIMPLE
GO

-- Add 7 filegroups with 4 files each
-- Add 365 files and filegroup
DECLARE @fg INT = 0, @f INT = 0, @sql NVARCHAR(MAX)

WHILE @fg < 7
BEGIN

    SET @fg += 1
    SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILEGROUP tooManyPartitionsTestFg' + CAST( @fg AS VARCHAR(5) )

    -- Add the filegroup
    PRINT @sql
    EXEC(@sql)


    -- Initialise
    SET @f = 0

    WHILE @f < 4
    BEGIN

        SET @f += 1
        --!!WARNING!! DON'T USE THESE SETTINGS IN PRODUCTION.  3MB starting size and 1MB filegrowth are just for demo - would be extremely painful for live data
        SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILE ( NAME = N''tooManyPartitionsTestFile@f_@fg'', FILENAME = N''s:\temp\tooManyPartitionsTestFile@f_@fg.ndf'', SIZE = 3MB, FILEGROWTH = 1MB ) TO FILEGROUP [tooManyPartitionsTestFg@fg]'
        SET @sql = REPLACE ( @sql, '@fg', @fg )
        SET @sql = REPLACE ( @sql, '@f', @f )

        -- Add the file
        PRINT @sql
        EXEC(@sql)

    END

END
GO


USE tooManyPartitionsTest
GO

SELECT * FROM sys.filegroups
SELECT * FROM sys.database_files 
GO

-- Generate partition function with ~3 years worth of daily partitions from 1 Jan 2014.
DECLARE @bigString NVARCHAR(MAX) = ''

;WITH cte AS (
SELECT CAST( '30 Apr 2014' AS DATE ) testDate
UNION ALL
SELECT DATEADD( day, 1, testDate )
FROM cte
WHERE testDate < '31 Dec 2016'
)
SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' )
FROM cte
OPTION ( MAXRECURSION 1100 )

SELECT @bigString = 'CREATE PARTITION FUNCTION pf_test (DATE) AS RANGE RIGHT FOR VALUES ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
SELECT @bigString bs

-- Create the partition function
PRINT @bigString
EXEC ( @bigString )
GO

/*
-- Look at the boundaries
SELECT *
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
GO
*/

DECLARE @bigString NVARCHAR(MAX) = ''

;WITH cte AS (
SELECT ROW_NUMBER() OVER( ORDER BY boundary_id ) rn
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
UNION ALL 
SELECT 1    -- additional row required for fg
)
SELECT @bigString += ',' + '[tooManyPartitionsTestFg' + CAST( ( rn % 7 ) + 1 AS VARCHAR(5) ) + ']'
FROM cte
OPTION ( MAXRECURSION 1100 )

SELECT @bigString = 'CREATE PARTITION SCHEME ps_test AS PARTITION pf_test TO ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
PRINT @bigString
EXEC ( @bigString )
GO




IF OBJECT_ID('dbo.yourLog') IS NULL
CREATE TABLE dbo.yourLog ( 
    logId       INT IDENTITY,
    someDate    DATETIME2 NOT NULL,
    someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
    dateAdded   DATETIME DEFAULT GETDATE(), 
    addedBy     VARCHAR(30) DEFAULT SUSER_NAME(), 

    -- Computed column for partitioning?
    partitionDate AS CAST( someDate AS DATE ) PERSISTED,

    CONSTRAINT pk_yourLog PRIMARY KEY ( logId, partitionDate )  -- << !!TODO try other way round

    ) ON [ps_test]( partitionDate )
GO


IF OBJECT_ID('dbo.yourAcks') IS NULL
CREATE TABLE dbo.yourAcks ( 
    ackId           INT IDENTITY(100000,1),
    logId           INT NOT NULL,
    partitionDate   DATE NOT NULL

    CONSTRAINT pk_yourAcks PRIMARY KEY ( ackId, logId, partitionDate )  

    ) ON [ps_test]( partitionDate )
GO



IF OBJECT_ID('dbo.yourLogSwitch') IS NULL
CREATE TABLE dbo.yourLogSwitch ( 
    logId       INT IDENTITY,
    someDate    DATETIME2 NOT NULL,
    someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
    dateAdded   DATETIME DEFAULT GETDATE(), 
    addedBy     VARCHAR(30) DEFAULT SUSER_NAME(), 

    -- Computed column for partitioning?
    partitionDate AS CAST( someDate AS DATE ) PERSISTED,

    CONSTRAINT pk_yourLogSwitch PRIMARY KEY ( logId, partitionDate )

    ) ON [ps_test]( partitionDate )
GO
-- Setup END
------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------
-- Data START
------------------------------------------------------------------------------------------------

-- OK load up data for Jan 2014 to today.
DECLARE @startDate DATETIME = '1 Jan 2014', @rand INT 

WHILE @startDate < GETDATE()
BEGIN

    -- Add between 1 and 10,000 rows to dbo.yourLog for today
    SET @rand = RAND() * 10000

    ;WITH cte AS (
    SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
    FROM master.sys.columns c1
        CROSS JOIN master.sys.columns c2
        CROSS JOIN master.sys.columns c3
    )
    INSERT INTO dbo.yourLog (someDate)
    SELECT TOP(@rand) DATEADD( second, rn % 30000, @startDate )
    FROM cte

    -- Add most of the Acks
    INSERT INTO dbo.yourAcks ( logId, partitionDate )
    SELECT TOP 70 PERCENT logId, partitionDate
    FROM dbo.yourLog
    WHERE partitionDate = @startDate

    SET @startDate = DATEADD( day, 1, @startDate )

    CHECKPOINT

END
GO

-- Have a look at the data we've loaded
SELECT 'before yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog 
SELECT 'before yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks

-- You'll see how pre-May data is initially clumped together
SELECT 'before $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK) 
GROUP BY $PARTITION.pf_test( partitionDate ) 
ORDER BY xMinDate
GO

-- Data END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Maintenance START
------------------------------------------------------------------------------------------------

-- Oh man, we're behind with our switching and truncation.
-- Create a job that sweeps up.  Do we get blocking?

-- ALTER TABLE dbo.yourLog SWITCH PARTITION 1 TO dbo.yourLogSwitch PARTITION 1
-- TRUNCATE TABLE dbo.yourLogSwitch

-- Let's pretend we only want to maintain up to 30 days ago
DECLARE @testDate DATE
SET @testDate = DATEADD( day, -30, GETDATE() )

-- Create local fast_forward ( forward-only, read-only ) cursor 
DECLARE partitions_cursor CURSOR FAST_FORWARD LOCAL FOR 
SELECT boundary_id, CAST( value AS DATE )
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
  AND value < @testDate

-- Cursor variables
DECLARE @boundary_id INT, @value DATE, @sql NVARCHAR(MAX)

OPEN partitions_cursor

FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
WHILE @@fetch_status = 0
BEGIN

    -- Switch out and truncate old partition
    SET @sql = 'ALTER TABLE dbo.yourLog SWITCH PARTITION ' + CAST( @boundary_id AS VARCHAR(5) ) + ' TO dbo.yourLogSwitch PARTITION ' + CAST( @boundary_id AS VARCHAR(5) )

    PRINT @sql
    EXEC(@sql)

    -- You could move the data elsewhere from here or just empty it out
    TRUNCATE TABLE dbo.yourLogSwitch

    --!!TODO yourAcks table

    FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
END

CLOSE partitions_cursor
DEALLOCATE partitions_cursor
GO

-- Maintenance END
------------------------------------------------------------------------------------------------



-- Have a look at the data we've maintained
SELECT 'after yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog 
SELECT 'after yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks

-- You'll see how pre-May data is initially clumped together
SELECT 'after $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK) 
GROUP BY $PARTITION.pf_test( partitionDate ) 
ORDER BY xMinDate



-- Remember, date must always be part of query now to get partition elimination
SELECT *
FROM dbo.yourLog
WHERE partitionDate = '1 August 2014'
GO


-- Cleanup
USE master
GO

IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    DROP DATABASE tooManyPartitionsTest 
GO

Best Answer

Related Solutions

Sql-server – Strategy to deal with multi billion row simulation results and partitioning

Sql-server – Dynamic Table partitioning on a daily basis

Related Question