Sql-server – How to find when Query Store flushes stale query data? Extended Events not working

query-storesql server

I searched online far and wide and didn't see anyone else posting about this so I thought I would. I am trying to audit when the query store does its cleanup and is purged from the query store.

There are two extended event sessions I found that I thought were the events but I have no data being picked up even after the query store has purged stale query data:

  1. query_store_db_cleanup_started
  2. query_store_db_cleanup_finished

Wondering if anyone else has had the same issue and is as curious as I when the query store purged its data?

Below is the code I have for setting up my current audit:

   CREATE EVENT SESSION [QueryStore_Cleanup_Audit]
ON SERVER
    ADD EVENT qds.query_store_db_cleanup__finished
    (ACTION
     (
         package0.last_error,
         package0.process_id,
         sqlos.cpu_id,
         sqlos.system_thread_id,
         sqlos.task_time,
         sqlserver.client_app_name,
         sqlserver.client_connection_id,
         sqlserver.client_hostname,
         sqlserver.client_pid,
         sqlserver.context_info,
         sqlserver.database_id,
         sqlserver.database_name,
         sqlserver.nt_username,
         sqlserver.plan_handle,
         sqlserver.server_instance_name,
         sqlserver.session_id,
         sqlserver.session_nt_username,
         sqlserver.sql_text,
         sqlserver.transaction_id,
         sqlserver.username
     )
     WHERE ([database_id] = (5))
    ),
    ADD EVENT qds.query_store_db_cleanup__started
    (ACTION
     (
         package0.last_error,
         package0.process_id,
         sqlos.cpu_id,
         sqlos.system_thread_id,
         sqlos.task_time,
         sqlserver.client_app_name,
         sqlserver.client_connection_id,
         sqlserver.client_hostname,
         sqlserver.client_pid,
         sqlserver.context_info,
         sqlserver.database_id,
         sqlserver.database_name,
         sqlserver.nt_username,
         sqlserver.plan_handle,
         sqlserver.server_instance_name,
         sqlserver.session_id,
         sqlserver.session_nt_username,
         sqlserver.sql_text,
         sqlserver.transaction_id,
         sqlserver.username
     )
     WHERE ([database_id] = (5))
    )
WITH
(
    MAX_MEMORY = 40096KB,
    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 30 SECONDS,
    MAX_EVENT_SIZE = 0KB,
    MEMORY_PARTITION_MODE = NONE,
    TRACK_CAUSALITY = OFF,
    STARTUP_STATE = ON
);
GO




query_store_db_cleanup_started

Edit – 10/8/18 – I was able to finally track cleanup events via the
EE! Below is my code as i was missing the "query_store_plan_removal" and "query_store_size_retention_cleanup_update" EE events:

CREATE EVENT SESSION [QueryStore_Cleanup_Audit] ON SERVER
ADD EVENT qds.query_store_db_cleanup__finished
(ACTION
 (
     package0.last_error,
     package0.process_id,
     sqlos.cpu_id,
     sqlos.system_thread_id,
     sqlos.task_time,
     sqlserver.client_app_name,
     sqlserver.client_connection_id,
     sqlserver.client_hostname,
     sqlserver.client_pid,
     sqlserver.context_info,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.server_instance_name,
     sqlserver.session_id,
     sqlserver.session_nt_username,
     sqlserver.sql_text,
     sqlserver.transaction_id,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_db_cleanup__started
(ACTION
 (
     package0.last_error,
     package0.process_id,
     sqlos.cpu_id,
     sqlos.system_thread_id,
     sqlos.task_time,
     sqlserver.client_app_name,
     sqlserver.client_connection_id,
     sqlserver.client_hostname,
     sqlserver.client_pid,
     sqlserver.context_info,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.server_instance_name,
     sqlserver.session_id,
     sqlserver.session_nt_username,
     sqlserver.sql_text,
     sqlserver.transaction_id,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_execution_runtime_info_discarded
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.plan_handle,
     sqlserver.query_hash,
     sqlserver.session_id,
     sqlserver.sql_text,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_flush_failed
(ACTION
 (
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.query_hash,
     sqlserver.query_plan_hash_signed,
     sqlserver.sql_text,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_plan_removal
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.query_plan_hash_signed,
     sqlserver.sql_text,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_size_retention_cleanup_finished
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.query_hash,
     sqlserver.query_plan_hash_signed,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_size_retention_cleanup_skipped
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.query_plan_hash_signed,
     sqlserver.session_id,
     sqlserver.sql_text,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_size_retention_cleanup_started
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.query_hash,
     sqlserver.query_plan_hash_signed,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_size_retention_cleanup_update
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.query_plan_hash_signed,
     sqlserver.sql_text,
     sqlserver.username
 )
),
ADD EVENT qds.query_store_size_retention_query_deleted
(ACTION
 (
     sqlos.task_time,
     sqlserver.database_id,
     sqlserver.database_name,
     sqlserver.nt_username,
     sqlserver.plan_handle,
     sqlserver.query_hash,
     sqlserver.sql_text,
     sqlserver.username
 )
) WITH (
MAX_MEMORY = 40096KB,
EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY = 30 SECONDS,
MAX_EVENT_SIZE = 0KB,
MEMORY_PARTITION_MODE = NONE,
TRACK_CAUSALITY = OFF,
STARTUP_STATE = ON ); GO

Best Answer

There are a few ways to determine when the purge is happening.

Extended Events

I've confirmed on one of my servers that the the "stale data purge" events do, indeed, fire whenever the stale data purge kicks off.

I would confirm you have data that qualifies, as the XE appears to bail out if there is no data to be purged. You can check on that with a query like this one:

USE [Your_Database];
GO

DECLARE @threshold AS INT;
SELECT @threshold = dbqso.stale_query_threshold_days 
FROM sys.database_query_store_options dbqso

SELECT * 
FROM sys.query_store_runtime_stats_interval qsrsi
WHERE 
    qsrsi.start_time < DATEADD(dd, -@threshold, CAST(GETDATE() AS DATE))
ORDER BY qsrsi.start_time;

Since my database has a interval_length_minutes of 60, this query returns up to 25 rows as I approach the time the XE fires (one for each hour in the calendar day being purged, plus the "last hour" from the previous calendar day):

query store results - before

And here's the last cleanup event from my XE session (2018-10-02 20:04:24.161):

  <event name="query_store_db_cleanup__started" package="qds" timestamp="2018-10-02T20:04:24.161Z">
    <data name="database_id">
      <type name="uint32" package="package0"></type>
      <value>6</value>
    </data>
    <action name="collect_system_time" package="package0">
      <type name="filetime" package="package0"></type>
      <value>2018-10-02T20:04:24.160Z</value>
    </action>
  </event>

And then I just wait for the next event, which occurs about 24 hours later (2018-10-03 20:04:59.441 so it was 35 seconds "late"):

  <event name="query_store_db_cleanup__started" package="qds" timestamp="2018-10-03T20:04:59.441Z">
    <data name="database_id">
      <type name="uint32" package="package0"></type>
      <value>6</value>
    </data>
    <action name="collect_system_time" package="package0">
      <type name="filetime" package="package0"></type>
      <value>2018-10-03T20:04:59.442Z</value>
    </action>
  </event>

And now the query returns one results (just the last hour of the calendar day being purged):

query store results - after

So it appears this XE works as advertised, but it does not fire when there is no data to be purged. So I would double check your data, and your purge threshold settings, to make sure that you actually have data that's being cleaned up!


Some other options for looking out for the stale data purge are listed below.

Transaction Log Backup Size

I noticed the purge occurring at 1:30am on one of my non-prod boxes because the hourly transaction log backup was unusually large. Since the purge is mainly just deleting rows from the query store tables in the user database where query store is enabled, the operation is logged like a normal query.

Watching for the Queries

Since I happened to be logging the contents of dm_exec_query_stats I was able to correlate the increase in transaction log records with several queries that started like this:

DELETE sys.plan_persist_plan WHERE query_id = -- etc, you get it

So you could monitor for queries that start with "DELETE sys.plan_persist*" using whatever tools you have available to identify when the purge occurs.

Controlling the Timing

As you mentioned and noticed, it doesn't seem like there is any way to configure or control when the stale data purge occurs. Of course, that shouldn't normally be a problem. But it does seem odd to me. For what it's worth, on my machine it appears to happen once a day, at the same time every day. So once the time is established, it seems to be deterministic.