Sql-server – Unbelievably slow and unusable Query Store

performancequery-performancequery-storesql-server-2017

I'm beginning with Query Store and it's with problems 🙁 No report of Top Resource Consuming Queries is ever generated. It just keeps saying "Waiting" for hours and hours. Query on the background (see below for an example) is not able to finish and is consuming lot of CPU. No matter how am I changing configuration options, it keeps waiting and waiting (even last hour report). Is this the reality of Query Store all over? Sounds being such a great tool but is totally unusable in real?

My Query Store size is about 1,7 GB. I'm collecting 7 day behind, in 1 hour interval and Auto Capture mode – so it seems to be reasonable settings, to me.

This is an example of a query on the background which is never finished:

SELECT TOP (@results_row_count)
    p.query_id query_id,
    q.object_id object_id,
    ISNULL(OBJECT_NAME(q.object_id),'') object_name,
    qt.query_sql_text query_sql_text,
    ROUND(CONVERT(float, SUM(rs.avg_duration*rs.count_executions))*0.001,2) total_duration,
    SUM(rs.count_executions) count_executions,
    COUNT(distinct p.plan_id) num_plans
FROM sys.query_store_runtime_stats rs
    JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
    JOIN sys.query_store_query q ON q.query_id = p.query_id
    JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
WHERE NOT (rs.first_execution_time > @interval_end_time OR rs.last_execution_time < @interval_start_time)
GROUP BY p.query_id, qt.query_sql_text, q.object_id
HAVING COUNT(distinct p.plan_id) >= 1
ORDER BY total_duration DESC

Any ideas how to make it operational?

EDITORIAL:
I have opened a case with Microsoft about this. We didn't solve it but came to a workaround. If I change Query Store Capture Mode to All (not using Auto mode) – then things go fine and the reports are generated in a timely fashion.

EDITORIAL #2
This has come to a completely different issue. The problem is not the Capture Mode but Legacy Cardinality Estimator which the database is running under.
Please vote for this Azure Feedback: https://feedback.azure.com/forums/908035-sql-server/suggestions/37971781-slow-or-non-operational-query-store-under-legacy-c

Best Answer

I have the same issue, and upon investigating, discovered the root issue in slow Query Store queries is repeated access of the in-memory TVFs. It doesn't matter how many rows are returned from these TVFs, only the number of times they are accessed. The fix (until MS fixes underlying stuff) will be whatever you can do to nudge your plans away from reading them multiple times. Some more information in this related question and in my blog post.

Ultimately, I was able to fix this with a plan guide in production, using hints that I found moved the optimizer away from repeated TVF executions. Since the exact query text is required, and the query rarely finishes, I built a script that would create a plan guide while the report is running by pulling the text from exec_requests. Follow the below steps to create a plan guide.

  1. Open up Top Resource Consuming Queries
  2. Run the below script while the report is still spinning
  3. Cancel the report and run again.
--USE YourDB
DECLARE @badqry NVARCHAR(MAX)

SELECT TOP (1) @badqry = SUBSTRING(st.text, (r.statement_start_offset/2) + 1,  
    ((CASE statement_end_offset   
        WHEN -1 THEN DATALENGTH(st.text)  
        ELSE r.statement_end_offset 
    END - r.statement_start_offset)/2) + 1)
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st
WHERE st.text LIKE '(@results_row_count int,@interval_start_time%'

IF @badqry IS NULL
RAISERROR('Missed the plan',16,1)

EXEC sp_create_plan_guide
@name = N'Fixing QueryStore Top Duration',
@stmt = @badqry,
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@results_row_count int,@interval_start_time datetimeoffset(7),@interval_end_time datetimeoffset(7)',
@hints = N'OPTION (HASH JOIN, LOOP JOIN)'

You should be able to run the above script for most of the canned Top Resource queries, but make sure to change the plan guide name if you do. Also, there's certainly a better hint out there than HASH JOIN, LOOP JOIN, but it's worked well enough for me so far.