Sql-server – SQL Server – same query occasionally taking 30x longer

performancequery-performancesql server

I've been trying to debug a performance issue with one of our .NET web applications. It appears that a certain query usually runs quickly, but occasionally is very slow, resulting in the page taking >10 sec to load. I've run a SQL trace and extracted the query.

Running the query in SSMS I see exactly the same behaviour. Usually I see something like this:

EventClass          | TextData        | CPU  | Reads  | Writes  | Duration
--------------------------------------------------------------------------
SQL:BatchStarting   | SELECT .....    |      |        |         |
SQL:StmtStarting    | SELECT .....    |      |        |         |
SQL:StmtCompleted   | SELECT .....    |   62 |    403 |       0 |     475
SQL:BatchCompleted  | SELECT .....    |   62 |    403 |       0 |     475

but occasionally (perhaps 5% of the time) it takes about 15 seconds to run, and I see something like this:

EventClass          | TextData        | CPU  | Reads  | Writes  | Duration
--------------------------------------------------------------------------
SQL:BatchStarting   | SELECT .....    |      |        |         |
SQL:StmtStarting    | SELECT .....    |      |        |         |
...
...
SQL:StmtStarting    | select ms_ticks |      |        |         |
SQL:StmtStarting    | select ms_ticks |      |        |         |
SQL:StmtCompleted   | select ms_ticks |    0 |      0 |       0 |       0
SQL:BatchCompleted  | select ms_ticks |    0 |    158 |       0 |       0
...
...
SQL:StmtCompleted   | SELECT .....    |   62 |    403 |       0 |   14690
SQL:BatchCompleted  | SELECT .....    |   62 |    749 |       0 |   14707

In between the StmtStarting and StmtCompleted for my query I see queries like this being run every 250ms or so:

select ms_ticks from sys.dm_os_sys_info

I presume that this is not relevant and just something added by SSMS.
The strange this is that SSMS insists that the SELECT statement is only taking .25 seconds:

Execution plan

I should note that this is running on my PC. I don't have high CPU or disk usage and no other applications are using the database whilst I'm repeatedly running this query.

The query itself looks like this (it looks a little strange as it's generated by EntityFramework):

SELECT 
[Project9].[QUE_ID] AS [QUE_ID], 
    ...
    (more columns)
    ...
FROM ( SELECT 
    [Distinct1].[QUE_ID] AS [QUE_ID], 
    ...
    (more columns)
    ...
    FROM   (SELECT DISTINCT 
        [Filter1].[QUE_ID1] AS [QUE_ID], 
        [Filter1].[CLS_ID] AS [CLS_ID], 
        [Filter1].[QUE_Name] AS [QUE_Name], 
        [Filter1].[QUE_CatchAll] AS [QUE_CatchAll], 
        [Filter1].[QUE_Order] AS [QUE_Order], 
        [Filter1].[QUE_isQueue] AS [QUE_isQueue], 
        [Filter1].[EMT_ID] AS [EMT_ID], 
        [Filter1].[QUE_isAdmin] AS [QUE_isAdmin], 
        [Filter1].[SLA_ID] AS [SLA_ID]
        FROM   (SELECT [Extent1].[QUE_ID] AS [QUE_ID1], [Extent1].[CLS_ID] AS [CLS_ID], [Extent1].[QUE_Name] AS [QUE_Name], [Extent1].[QUE_CatchAll] AS [QUE_CatchAll], [Extent1].[QUE_Order] AS [QUE_Order], [Extent1].[QUE_isQueue] AS [QUE_isQueue], [Extent1].[EMT_ID] AS [EMT_ID], [Extent1].[QUE_isAdmin] AS [QUE_isAdmin], [Extent1].[SLA_ID] AS [SLA_ID], [Extent2].[ROL_ID] AS [ROL_ID]
            FROM  [dbo].[QUE_Queue] AS [Extent1]
            INNER JOIN [dbo].[PRM_Permission] AS [Extent2] ON [Extent1].[QUE_ID] = [Extent2].[QUE_ID]
            WHERE [Extent1].[QUE_isQueue] = 1 ) AS [Filter1]
        INNER JOIN [dbo].[ROL_Role] AS [Extent3] ON [Filter1].[ROL_ID] = [Extent3].[ROL_ID]
        WHERE  EXISTS (SELECT 
            1 AS [C1]
            FROM  (SELECT 
                N'Group1' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
            UNION ALL
                SELECT 
                N'Group2' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]
            UNION ALL
                SELECT 
                N'Group3' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable3]
            UNION ALL
                SELECT 
                N'NGroup4' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable4]
            UNION ALL
                SELECT 
                N'Group5' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable5]
            UNION ALL
                SELECT 
                N'Group6' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable6]) AS [UnionAll5]
            WHERE [UnionAll5].[C1] = [Extent3].[ROL_Name]
        ) ) AS [Distinct1]
    LEFT OUTER JOIN [dbo].[EVE_Event] AS [Extent4] ON [Distinct1].[QUE_ID] = [Extent4].[QUE_ID]
)  AS [Project9]
ORDER BY [Project9].[QUE_ID] ASC, [Project9].[CLS_ID] ASC, [Project9].[QUE_Name] ASC, [Project9].[QUE_CatchAll] ASC, [Project9].[QUE_Order] ASC, [Project9].[QUE_isQueue] ASC, [Project9].[EMT_ID] ASC, [Project9].[QUE_isAdmin] ASC, [Project9].[SLA_ID] ASC, [Project9].[C1] ASC

Approx size of tables:

Queue: 10
Permission: 100
Role: 10
Event: 10000
(All have clustered indexes)

I'm at a bit of a loss as to what could be going on or how I can find out what is taking up all of that extra time.

Top wait types based on query suggested below:

Wait_Type                       Wait_Time_Seconds   Waiting_Tasks_Count Percentage_WaitTime
TRACEWRITE                      22088.985000        69107               97.850477511451889
RESOURCE_SEMAPHORE              219.884000          27                  0.974049029284418
ASYNC_NETWORK_IO                104.392000          158093              0.462438950833435
THREADPOOL                      33.040000           29256               0.146361626710252
WRITELOG                        29.749000           23519               0.131783051846347
PREEMPTIVE_XE_CALLBACKEXECUTE   21.091000           22484222            0.093429572304659
PAGEIOLATCH_SH                  19.044000           30045               0.084361707598972
PARALLEL_REDO_WORKER_WAIT_WORK  18.639000           2013                0.082567625915629
LCK_M_S                         15.062000           92                  0.066722119295091
LCK_M_IX                        10.824000           69                  0.047948494174085
IO_COMPLETION                   3.486000            7475                0.015442391970700

Best Answer

Here are a few reasons why query is running slow sometimes even though page reads/writes are the same.

  1. The table may be locked which are accessed in the query. You may check that with sp_who2 or sp_whoisactive.
  2. Autogrowth for your database may have kicked in during that time. Change autogrowth to a higher value.
  3. Index maintenance or statistics update might have kicked in for any tables used in query.

Well, these are the starting points for you to investigate.

One of my clients had a similar situation and they were running out of the threads and there was a huge wait released to thread pool for them. So honestly, there can be many reasons for this but try out the first three points listed in the blog post and also post your top waits based on the query listed here: https://blog.sqlauthority.com/2017/08/29/sql-server-wait-stats-collection-scripts-2016-later-versions/