Sql-server – SQL Server – same query occasionally taking 30x longer

performancequery-performancesql server

I've been trying to debug a performance issue with one of our .NET web applications. It appears that a certain query usually runs quickly, but occasionally is very slow, resulting in the page taking >10 sec to load. I've run a SQL trace and extracted the query.

Running the query in SSMS I see exactly the same behaviour. Usually I see something like this:

EventClass          | TextData        | CPU  | Reads  | Writes  | Duration
--------------------------------------------------------------------------
SQL:BatchStarting   | SELECT .....    |      |        |         |
SQL:StmtStarting    | SELECT .....    |      |        |         |
SQL:StmtCompleted   | SELECT .....    |   62 |    403 |       0 |     475
SQL:BatchCompleted  | SELECT .....    |   62 |    403 |       0 |     475

but occasionally (perhaps 5% of the time) it takes about 15 seconds to run, and I see something like this:

EventClass          | TextData        | CPU  | Reads  | Writes  | Duration
--------------------------------------------------------------------------
SQL:BatchStarting   | SELECT .....    |      |        |         |
SQL:StmtStarting    | SELECT .....    |      |        |         |
...
...
SQL:StmtStarting    | select ms_ticks |      |        |         |
SQL:StmtStarting    | select ms_ticks |      |        |         |
SQL:StmtCompleted   | select ms_ticks |    0 |      0 |       0 |       0
SQL:BatchCompleted  | select ms_ticks |    0 |    158 |       0 |       0
...
...
SQL:StmtCompleted   | SELECT .....    |   62 |    403 |       0 |   14690
SQL:BatchCompleted  | SELECT .....    |   62 |    749 |       0 |   14707

In between the StmtStarting and StmtCompleted for my query I see queries like this being run every 250ms or so:

select ms_ticks from sys.dm_os_sys_info

I presume that this is not relevant and just something added by SSMS.
The strange this is that SSMS insists that the SELECT statement is only taking .25 seconds:

I should note that this is running on my PC. I don't have high CPU or disk usage and no other applications are using the database whilst I'm repeatedly running this query.

The query itself looks like this (it looks a little strange as it's generated by EntityFramework):

SELECT 
[Project9].[QUE_ID] AS [QUE_ID], 
    ...
    (more columns)
    ...
FROM ( SELECT 
    [Distinct1].[QUE_ID] AS [QUE_ID], 
    ...
    (more columns)
    ...
    FROM   (SELECT DISTINCT 
        [Filter1].[QUE_ID1] AS [QUE_ID], 
        [Filter1].[CLS_ID] AS [CLS_ID], 
        [Filter1].[QUE_Name] AS [QUE_Name], 
        [Filter1].[QUE_CatchAll] AS [QUE_CatchAll], 
        [Filter1].[QUE_Order] AS [QUE_Order], 
        [Filter1].[QUE_isQueue] AS [QUE_isQueue], 
        [Filter1].[EMT_ID] AS [EMT_ID], 
        [Filter1].[QUE_isAdmin] AS [QUE_isAdmin], 
        [Filter1].[SLA_ID] AS [SLA_ID]
        FROM   (SELECT [Extent1].[QUE_ID] AS [QUE_ID1], [Extent1].[CLS_ID] AS [CLS_ID], [Extent1].[QUE_Name] AS [QUE_Name], [Extent1].[QUE_CatchAll] AS [QUE_CatchAll], [Extent1].[QUE_Order] AS [QUE_Order], [Extent1].[QUE_isQueue] AS [QUE_isQueue], [Extent1].[EMT_ID] AS [EMT_ID], [Extent1].[QUE_isAdmin] AS [QUE_isAdmin], [Extent1].[SLA_ID] AS [SLA_ID], [Extent2].[ROL_ID] AS [ROL_ID]
            FROM  [dbo].[QUE_Queue] AS [Extent1]
            INNER JOIN [dbo].[PRM_Permission] AS [Extent2] ON [Extent1].[QUE_ID] = [Extent2].[QUE_ID]
            WHERE [Extent1].[QUE_isQueue] = 1 ) AS [Filter1]
        INNER JOIN [dbo].[ROL_Role] AS [Extent3] ON [Filter1].[ROL_ID] = [Extent3].[ROL_ID]
        WHERE  EXISTS (SELECT 
            1 AS [C1]
            FROM  (SELECT 
                N'Group1' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
            UNION ALL
                SELECT 
                N'Group2' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]
            UNION ALL
                SELECT 
                N'Group3' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable3]
            UNION ALL
                SELECT 
                N'NGroup4' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable4]
            UNION ALL
                SELECT 
                N'Group5' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable5]
            UNION ALL
                SELECT 
                N'Group6' AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable6]) AS [UnionAll5]
            WHERE [UnionAll5].[C1] = [Extent3].[ROL_Name]
        ) ) AS [Distinct1]
    LEFT OUTER JOIN [dbo].[EVE_Event] AS [Extent4] ON [Distinct1].[QUE_ID] = [Extent4].[QUE_ID]
)  AS [Project9]
ORDER BY [Project9].[QUE_ID] ASC, [Project9].[CLS_ID] ASC, [Project9].[QUE_Name] ASC, [Project9].[QUE_CatchAll] ASC, [Project9].[QUE_Order] ASC, [Project9].[QUE_isQueue] ASC, [Project9].[EMT_ID] ASC, [Project9].[QUE_isAdmin] ASC, [Project9].[SLA_ID] ASC, [Project9].[C1] ASC

Approx size of tables:

Queue: 10
Permission: 100
Role: 10
Event: 10000
(All have clustered indexes)

I'm at a bit of a loss as to what could be going on or how I can find out what is taking up all of that extra time.

Top wait types based on query suggested below:

Wait_Type                       Wait_Time_Seconds   Waiting_Tasks_Count Percentage_WaitTime
TRACEWRITE                      22088.985000        69107               97.850477511451889
RESOURCE_SEMAPHORE              219.884000          27                  0.974049029284418
ASYNC_NETWORK_IO                104.392000          158093              0.462438950833435
THREADPOOL                      33.040000           29256               0.146361626710252
WRITELOG                        29.749000           23519               0.131783051846347
PREEMPTIVE_XE_CALLBACKEXECUTE   21.091000           22484222            0.093429572304659
PAGEIOLATCH_SH                  19.044000           30045               0.084361707598972
PARALLEL_REDO_WORKER_WAIT_WORK  18.639000           2013                0.082567625915629
LCK_M_S                         15.062000           92                  0.066722119295091
LCK_M_IX                        10.824000           69                  0.047948494174085
IO_COMPLETION                   3.486000            7475                0.015442391970700

Best Answer

Here are a few reasons why query is running slow sometimes even though page reads/writes are the same.

The table may be locked which are accessed in the query. You may check that with sp_who2 or sp_whoisactive.
Autogrowth for your database may have kicked in during that time. Change autogrowth to a higher value.
Index maintenance or statistics update might have kicked in for any tables used in query.

Well, these are the starting points for you to investigate.

One of my clients had a similar situation and they were running out of the threads and there was a huge wait released to thread pool for them. So honestly, there can be many reasons for this but try out the first three points listed in the blog post and also post your top waits based on the query listed here: https://blog.sqlauthority.com/2017/08/29/sql-server-wait-stats-collection-scripts-2016-later-versions/

Related Solutions

Sql-server – sp_ExecuteSQL, performance and table variables

There is an implicit CREATE TABLE at the beginning and DROP TABLE @T at the end as the table goes out of scope that aren't associated with either the INSERT or the SELECT statement.

If you replace with a #temp table you should see some additional reads for both the drop and create statements. For me I see 36 for the CREATE and 100 for the DROP so that accounts for 136 of your missing 137 reads.

(Script with #temp table)

EXEC sp_executesql N'
   CREATE TABLE #T 
  ( 
     x INT 
  ) 

INSERT INTO #T 
            (x) 
SELECT TOP 10 number 
FROM   master..spt_values 

SELECT * 
FROM   #T 
UNION 
SELECT * 
FROM   #T 

DROP TABLE #T
'

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

This:

SET SHOWPLAN_XML ON;
GO
SELECT * FROM sys.objects;
GO

Is equivalent to pressing Display Estimated Execution Plan on the toolbar (or hitting Ctrl + L). You'll notice that no rows are returned from the query, like there is when you use Include Actual Execution Plan (Ctrl + M).

The spill warning is only a runtime warning. There is no way that SQL Server can know, when displaying the estimated plan, that a spill will happen at runtime. This is because a spill is caused by factors that might only be present during certain invocations of the query (for example, when there is memory pressure). The estimated plan knows roughly how much memory it's going to ask for, but it can't know until execution that it isn't going to get it.

As an aside, may I recommend* our free tool, SQL Sentry Plan Explorer? I think it provides much more obvious information than Management Studio. I recently wrote a lengthy blog post that can act as a tutorial, and Jonathan Kehayias has a great PluralSight course on it as well.

_{* Disclaimer: I work for SQL Sentry.}

Best Answer

Related Solutions

Sql-server – sp_ExecuteSQL, performance and table variables

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

Related Question