Query Execution Plan Issues Until Statistics Updated – SQL Server

execution-plansql serversql-server-2016statistics

I hope you guys can help me here.
Our application polls a message table every 3 seconds looking for notifications to send out. This works well on all our Customers (single-tenant DB) except one. They will have no activity for 23hours a day and then load thousands of messages all at once (3000+). In other cases, this volume is nothing and we can easily deal with it, except in this case, the SQL query below takes approximately 30 seconds to run and gets worse as the queue backs up as it does an update, require an exclusive lock and therefore blocking all the other queries and so the issues causes all kinds of havoc.
This is all due to a bad query plan.

We have daily reindexing that runs every morning at 5am (reorganise < 30%, rebuild > 30%, ignore <5%) as well as updates statistics. These are both from the Ola Hallengren maintenance solution.
We are also on SQL Server 2016 and are fully up to date (13.0.5492.2)

I don't have the 2 plans handy, but basically the bad plan goes and does a full table scan of the MessagesSent table (3.5m rows).

My theory is that because the query returns nothing all day, certain parts are not executed and therefore the bad query is the most efficient query for SQL.

This will continue after flushing the plan for the query as it just generates the same plan, however when I UPDATE STATISTICS on the MessagesSent table, the good plan is created and all is healthy, with the query executing in about 10-30ms.

Does anyone know how I can fine-tune this to always use the better plan even if no data exsists for the query to return?
As a hotfix, we have added option recompile to the application but I don't feel that's the ideal solution for a query that is executing ever 3 seconds.

Here is the query :

WITH TopMessage
    AS
    (
        SELECT TOP 1 ID, BatchID FROM MessagesSent 
        JOIN Units ON Unit = idUnit 
        WHERE   MessageDate <= GETDATE() 
          AND         Active = 'True' 
          AND         Status = 'Queued' 
          AND NOT(DialString = 'null') 
          AND           Unit = ('29') 
          AND System in ('SystemName', 'Q1', '') 
        ORDER BY 
            CASE 
                WHEN QPriority IS NULL 
                    THEN 
                        CASE 
                            WHEN DefaultPriority IS NULL 
                                THEN 999999 
                                ELSE DefaultPriority 
                        END
                 ELSE QPriority 
            END ASC,
            Retries ASC, 
            MessageDate ASC, 
            ID ASC
    ),
    BatchCalls
    AS
    (
        SELECT * FROM MessagesSent 
        WHERE (
                 (LEN(BatchID) > 0 
                  AND BatchID = (SELECT TOP 1 BatchID FROM TopMessage)
                 ) 
        OR ID = (SELECT TOP 1 ID FROM TopMessage)
        )
        AND Status = 'Queued' AND Active = 'True'
    )

    UPDATE BatchCalls
    SET LastUpdated = @dtNow
    OUTPUT INSERTED.*
    WHERE Status = 'Queued'

Thank you very much for your time and for looking.

Best Answer

You could use a manual plan guide to enforce the desired plan.

Alternatively, you could use Query Store to enforce plan guides via the GUI.

Also, you could run an update statistics job after the big load of messages. I wrote a post on my blog showing an easy way to do that.