Sql-server – Why is the query_hash zero for EXEC statements

extended-eventssql serversql server 2014

I'm setting up a monitoring system for SQL Server using Extended Events to find heavy queries as 'production feedback' for our developers. I'm using the events sp_statement_completed and sql_statement_completed, with predicate filters on cpu_time, logical reads, etc. I was hoping to aggregate the results on database_name and query_hash like demonstrated in numerous examples all over the internet, but in the results I see that query_hash is 0 for all statements using EXEC, like in the table below (timestamp and queryhash shortened for readability).

name                       timestamp      query_hash plan_handle        statement
sql_statement_completed    2016...6414    0          050056019600764... exec Shared.dbo.SyncFirm  
sql_statement_completed    2016...9946    0          06003d00e01e730... exec spSetUserAuth @userid;  
sql_statement_completed    2016...7184    0          0600e30028c9da0... exec spSetUserAuth @userid;  
sp_statement_completed     2016...0409    9826...578 0600c00028e6aa0... SELECT obfuscated_columns FROM dbo.SomeTable
sp_statement_completed     2016...1448    8660...775 060084006d2d660... INSERT INTO dbo.SomeTable ( obfuscated_columns)  EXEC(@sql)
sql_statement_completed    2016...7752    0          0600f9006c23f03... exec spSetUserAuth @userid;  
sql_statement_completed    2016...1443    1304...641 06005a0008a9b11... select SUBQ.ontrackstatus, COUNT(SUBQ.ontrac

All results do have a value for plan_handle and they're all different, so a lot of plans are being generated. Other statements without query_hash (that I've seen) include ALTER INDEX, CHECKPOINT, UPDATE STATISTICS, COMMIT TRANSACTION, FETCH NEXT FROM Cursor, some INSERTs, SELECT @variable, IF(@variable = x).

Does anybody know why the query_hash is 0? I'm probably missing the point somewhere about the SQL Query Analyzer and EXEC, but I'm not able to find any clues to point me in the right direction. If the results I'm having are 'normal', then how to best aggregate the results? Wouldn't grouping by statement include literals, whitespace, etc… which is removed when calculating query_hash?

EDIT: as I see it now, EXEC SomeStoredProcedure, starts a stored procedure (obvious), and the individual statements in that stored procedure end up in the event session as sp_statement_completed events, and those all have a query_hash.

So for sp_statement_completed (i.e. 'real' queries), I can aggregate on query_hash and database_name, and for sql_statement_completed without query_hash (the EXEC SomeStoredProcedure), I can use the client_connection_id to group the statements within a specific execution of a stored procedure, to see what's the most costly part of the procedure.

Best Answer

To explain why the hash is created:

When we submit a query to the server, the algebrizer (yes that's what it is called) process creates a hash, like a coded signature, of the query. The hash is a unique identifier. An identifier is unique for any given query, including all the text that defines the query, including spaces and carriage returns, the optimizer compares the hash to queries in the cache. If a query exists in the cache that matches the query coming into the engine, the entire cost of the optimization process is skipped and the execution plan in the plan cache is reused.

EXEC starts a stored procedure which can have its code change, since SQL Server knows that it does not need to compare EXEC to optimize it, SQL Server does not create a hash.