SQL Server – Same Query Hash for Different Queries

execution-plansql server

Query_hash definition from sys.dm_exec_query_stats is:

query_hash: Binary hash value calculated on the query and used to identify queries with similar logic. You can use the query hash to determine the aggregate resource usage for queries that differ only by literal values.

But when I search for one particular query_hash (as shown below), I am getting multiple query texts that differ heavily in logic and size_in_bytes (for plan).

Note: I am using SQL Server 2016

DECLARE @QueryHashTest BINARY(8)
SET @QueryHashTest = CONVERT(BINARY(8), 'Ð…U¹üŒv¿')

SELECT  
        QCP.objtype
        ,qStat.query_hash,
        CONVERT(VARCHAR(100),  qStat.query_Hash) AS VARCHAR_query_hash
        ,sText.text AS QueryText
        ,QCP.size_in_bytes
        ,qStat.creation_time
        ,qp.query_plan  
FROM ( 
        SELECT query_hash,
               COUNT(query_hash) AS PlanCount
        FROM sys.dm_exec_query_stats
        GROUP BY query_hash
    ) AS MultipleQ
INNER JOIN sys.dm_exec_query_stats qStat ON MultipleQ.query_hash = qStat.query_hash
INNER JOIN sys.dm_exec_cached_plans  QCP
    ON QCP.plan_handle = qStat.plan_handle
CROSS APPLY sys.dm_exec_sql_text(qStat.sql_handle) AS sText
CROSS APPLY sys.dm_exec_query_plan(qStat.plan_handle) AS qp
WHERE PlanCount > 1
AND QCP.objtype = 'Proc'
AND qStat.query_hash= @QueryHashTest 
AND (size_in_bytes >= 2179072 OR size_in_bytes <= 262144)
ORDER BY size_in_bytes DESC

Result
enter image description here

Why are the unrelated queries showing same query_hash? Is it a bug or is there something unexpected going on in the database?

Best Answer

While hash collisions are possible, I suspect what's happening is you're pulling the text for the whole batch, not the particular statement. A procedure with multiple statements can get a cache entry for each statement. And if some statements are identical across procedures, they'll have the same query hash. Let me demonstrate:

USE TestDB
GO

--DROP TABLE dbo.employees
CREATE TABLE dbo.employees (
ID INT IDENTITY PRIMARY KEY,
title VARCHAR(20),
salary TINYINT,
has_been_fired_yet BIT)

INSERT dbo.employees
VALUES ('Senior DBA',30,0),('Grumpy DBA',50,0)
GO

CREATE OR ALTER PROC dbo.Payroll
AS

SELECT 'Your total payroll is '+CONVERT(VARCHAR(10),SUM(salary))+' query bucks'
FROM dbo.employees
WHERE has_been_fired_yet = 0
GO

CREATE OR ALTER PROC dbo.RightSizing
AS

UPDATE dbo.employees
SET has_been_fired_yet = 1
WHERE salary > 40

SELECT 'Your total payroll is '+CONVERT(VARCHAR(10),SUM(salary))+' query bucks'
FROM dbo.employees
WHERE has_been_fired_yet = 0
GO

--EXEC to get them in the cache
EXEC dbo.Payroll
EXEC dbo.RightSizing

Use the below query to see the how the same statement that is in each proc shows up separately.

SELECT
st.text AS batch_text,
SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,  
    ((CASE statement_end_offset   
        WHEN -1 THEN DATALENGTH(st.text)  
        ELSE qs.statement_end_offset 
    END - qs.statement_start_offset)/2) + 1
) AS statement_text,
qs.execution_count,
qs.query_hash
FROM sys.dm_exec_query_stats AS qs  
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st 
WHERE st.objectid IN (OBJECT_ID('dbo.Payroll'),OBJECT_ID('dbo.RightSizing'))

query results If you use the query in your question, it fails to pick up the identical statement, since it looks at only the batch text.