Sql-server – Improve table design to improve performance

database-designperformancesql server

I currently have a SQL Server 2016 Standard instance running in a Windows Server 2016 VM which has 4 x 2 core processors and 12 GB of ram.

The SQL Server install is pretty standard with no extra options configured etc.

I have a table called MessageStack which receives incoming messages from an ASP.NET MVC web application and a .NET TCP Server application.

I have another application that processes these messages. All applications are local to the SQL Server.

Currently the MessageStack table receives on average about 70-100 messages per second although these can burst to upwards of 500 messages per second. This table has various columns, but the main ones are:

[ms_id] [int] IDENTITY(1,1) NOT NULL
[ms_encodedalert] [nvarchar](2000) NOT NULL
[ms_receivedtime] [datetime] NOT NULL
[ms_processing] [bit] NOT NULL 
        CONSTRAINT [DF_MessageStack_ms_processing]  DEFAULT ((0))
[ms_thread] [nvarchar](50) NULL

On this table I have a clustered index on ms_id as this is the only column I use to identify a row in this table.

ALTER TABLE [dbo].[MessageStack] 
    ADD CONSTRAINT [PK_MessageStack] 
    PRIMARY KEY CLUSTERED ([ms_id] ASC)
                WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                      SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, 
                      ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

Now the application that processes these messages will pull the top 1000 messages in that table, setting ms_processing to 1 and putting a name in ms_thread and then begin processing them. It's a multi threaded application and can manage around 100 messages a second. It passes each message to a thread which does the processing and then eventually deletes the message once done using

delete from messagestack 
where ms_id = xxxxx

This has all been good for a while but very recently our incoming messages has been increasing slightly and I've found that the bigger the MessageStack table the longer queries are taking. Looking into this a bit more using things like sp_BlitzFirst and sp_BlitzIndex I can see that the clustered index is causing lots of page locks and waits which explains why my message processing time is going crazy with single row deletes sometimes taking 1.4 seconds!

Running Query [0]: delete from messagestack where ms_id=58550
    Query stats[1403.08100 ms][1 rows].

I have tried to remove the clustered index and instead use a UDF to randomly generate the ms_id using the SEQUENCE feature so that the ms_id values weren't close together hoping to avoid page locks. However this just moved the problem else where causing the inserts to take longer. I got this idea from reading this page https://blogs.msdn.microsoft.com/blogdoezequiel/2013/05/23/pagelatch_ex-waits-and-heavy-inserts/

The next option I tried was creating a new table called IncomingStack and not having a ms_id column. Instead a process moved all messages from this table to the messagestack table (which causes a table lock while it's being done as I wrapped it in a transaction) in an attempt to limit the times the clustered index was being updated. At this point I realised that it's not so much my inserts causing the problem but the overall transactions/second on just the messagestack table which at peak can be around 700-800.

The next step I tried was to turn the message processing app into a distributed application running on 2 servers at once. My theory being that if a single server can do approx 100 messages/sec then 2 should be able to do more than that! All that happens now is the general queries run a bit slower so both servers settle at around 60 messages/sec which is better but still not what I had hoped for.

The final options I haven't tried yet are

  1. Change ms_id to be a GUID column that uses NEWID() as a default value. I would still require an index on ms_id but maybe the randomness would would better than using SEQUENCE and a custom UDF. This however would require updating a load of code and stored procedures.

  2. Set the current index to not allow page or row locks. I don't know what the impact of this would be though.

In short I don't know if there is a better design hence the question. I'm pretty sure in any option I need an index on ms_id otherwise my code will resort to table scans to find rows.

Does anyone know what the implications of my 2 options are? Is 1 worth testing or is it basically pointless and 2 what if I change the page/row locks setting.

Or maybe there are other options I'm unaware of!

Thanks

Best Answer

It sounds like you want to check out Chris Adkin's posts on using the LMax disruptor pattern for queues in SQL Server.

His post shows how to design tables to support many thousands of messages processed per second.

One primary consideration is to create "slots" for messages that never get deleted. Deleting and creating rows is operationally expensive in SQL Server compared to simply updating data in pre-existing rows. Also, you may need to have a single row-per-page if you are really serious about reducing the effects of locking that you're currently experiencing.