Sql-server – Should I believe Query Store runtime stats

query-storesql server

There is this weird behavior of Query Store runtime stats that I observe at many of my servers with different scenarios that makes me not to believe these stats. Or am I wrong in something?

E.g. trivial query like this with a given query plan:

Stats (sys.query_store_runtime_stats) reports absolutely insane number of max_logical_io_reads at some accidental interval:

But table has only 27 pages allocated altogether!

I am experiencing this phenomenon with different queries cross many environments. And it's ruining my regressed query analysis.

Queries with different numbers of reads don't have different plans.

The heap table shown only updates and inserts rarely. It's just coincident I used a heap as an example. I experience this with clustered tables too.

Best Answer

Pathology!

This could happen to queries in ways that wouldn't impact the plans:

Heaps: Forwarded Fetches
Indexes: Internal fragmentation

Additionally some selects that cause writes may trigger additional reads if there are stats updates, spills, spools, etc. Though none of those show in your plan, someone else may some along later and find this tidbit useful.

Heaps

DROP TABLE IF EXISTS dbo.el_heapo;
CREATE TABLE dbo.el_heapo ( col1 BIGINT, col2 VARCHAR(8000) DEFAULT 'A' );

INSERT dbo.el_heapo ( col1 )
SELECT TOP 1000000
       x.rn
FROM   (   SELECT     ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 / 0 )) AS rn
           FROM       sys.messages AS m
           CROSS JOIN sys.messages AS m2 ) AS x;

Run a simple query:

SELECT COUNT_BIG(*)
FROM   dbo.el_heapo;

Table 'el_heapo'. Scan count 1, logical reads 2718 
CPU time = 62 ms,  elapsed time = 62 ms.

Look at the table: No forwarded fetches.

SELECT OBJECT_NAME(ddips.object_id) AS TableName, 
       ddips.index_type_desc, 
       ddips.forwarded_record_count
FROM   sys.dm_db_index_physical_stats(DB_ID(), 
                                      OBJECT_ID('dbo.el_heapo'), 
                                      0, 
                                      NULL, 
                                      'DETAILED') AS ddips;

Cause some forwarded fetches:

UPDATE dbo.el_heapo
SET    col2 = REPLICATE('Z', 1000)
WHERE  col1 % 3 = 0;

If you run the count and forwarded fetch query again, you should see:

~330k forwarded fetches

Table 'el_heapo'. Scan count 1, logical reads 380564
 SQL Server Execution Times:
   CPU time = 359 ms,  elapsed time = 360 ms.

But the execution plan will be the same.

Indexes

Create a dummy table do I don't mess up my real Users table:

SELECT *
INTO dbo.TempUsers
FROM StackOverflow2013.dbo.Users AS u;

CREATE UNIQUE CLUSTERED INDEX c ON dbo.TempUsers(Id);

Check on the table size:

SELECT   TOP 2147483647 
         s.name AS schema_name,
         OBJECT_NAME(ps.object_id) AS table_name,         
         i.name AS index_name,
         ps.row_count,
         ps.in_row_used_page_count,
         (ps.reserved_page_count * 8. / 1024.) AS reserved_MB,
         (ps.lob_reserved_page_count * 8. / 1024.) AS reserved_LOB_MB,
         (ps.row_overflow_reserved_page_count * 8. / 1024.) AS reserved_row_overflow_MB
FROM     sys.dm_db_partition_stats AS ps
JOIN     sys.objects AS so
    ON  ps.object_id = so.object_id
    AND so.is_ms_shipped = 0
    AND so.type <> 'TF'
JOIN     sys.schemas AS s
    ON s.schema_id = so.schema_id
JOIN sys.indexes AS i
    ON ps.object_id = i.object_id
    AND ps.index_id = i.index_id
ORDER BY ps.object_id, ps.index_id, ps.partition_number

Should be 44,450 pages and 348MB.

Simple query:

SELECT COUNT(*)
FROM dbo.TempUsers AS tu

Table 'TempUsers'. Scan count 1, logical reads 44439
   CPU time = 219 ms,  elapsed time = 229 ms.

Update the DisplayName column to be the max length (nvarchar 40):

UPDATE t
SET t.DisplayName = t.DisplayName + REPLICATE('A', 40 - (DATALENGTH(t.DisplayName) / 2))
FROM dbo.TempUsers AS t

Now the table is much larger:

88,392 pages, 691MB.

Query results:

Table 'TempUsers'. Scan count 1, logical reads 88381
   CPU time = 250 ms,  elapsed time = 245 ms.

Related Solutions

SQL Server Query Store – What is an Ad-Hoc Query?

After a little bit of searching, I wasn't able to find a concrete and satisfying source from Microsoft's documentation to answer this question. There are many good third-party descriptions and definitions of adhoc/ad-hoc/ad hoc out there, but for the specificity of this question I think one close to the source is ideal.

Moving past generic (but still accurate) definitions like this SO post (thanks SqlWorldWide), if we look at what the documentation does say on the topic, it agrees with what you've mentioned about the definition being based off of the number of executions, I think to the point that we can take it as fact.

The article on optimizing for ad hoc workflows says,

When this option is set to 1, the Database Engine stores a small compiled plan stub in the plan cache when a batch is compiled for the first time, instead of the full compiled plan. This helps to relieve memory pressure by not allowing the plan cache to become filled with compiled plans that are not reused. The compiled plan stub allows the Database Engine to recognize that this ad hoc batch has been compiled before but has only stored a compiled plan stub, so when this batch is invoked (compiled or executed) again, the Database Engine compiles the batch ... and adds the full compiled plan to the plan cache.

So it would appear the the ad hoc query server configuration option also uses the definition of a single execution as the definition of ad hoc. If the query continues to get executed and generates the same plan it would no longer be treated as such.

The article on Best Practices for Query Store also aligns with this,

Compare the number of distinct query_hash values with the total number of entries in sys.query_store_query. If the ratio is close to 1 your ad-hoc workload generates different queries.

This, of course, is for queries that are not already used as stored procedures, parameterized, etc. because they can be recognized and treated appropriately right off the bat.

So based on all this we can say that a query is treated as ad hoc if:

It is not parameterized
It is not programmatically stored in the database (stored proc, function, trigger, etc.)
The same query is only executed one time OR The same query is executed multiple times but generates a different query plan for each subsequent execution.

SQL Server 2016 – Resolving Head Blocker Issues in HP 3PAR Environment

When you're troubleshooting a blocking chain happening live, start with sp_WhoIsActive:

sp_WhoIsActive @get_locks = 1

There's a blocked-by column that shows you who's blocking who. The lead blocker won't have anyone in their blocked-by column.

Once you've found the lead blocker, click on the locks column - it's an XML field that expands to show you the full list of locks held by that session.

It could be that the select is part of a longer transaction (there's also an open transactions column), or that the select isn't really what's blocking.

Your best bet for getting specific, actionable advice here would be to post a picture of the sp_WhoIsActive output, including the blocking columns, and then the contents of the XML for the locks.