SQL Server Performance – Why Data Retrieval Takes Over 4 Hours for 400K Records

azure-sql-databasequery-performancesql serverssms

I have a table with 15 million rows, it's a parent table with 12 child tables.

Even for a simple count query, it is taking hour's to complete.

select count(1) from table
where col_filter >= 'number'

table contains 40 columns and col_filter is having varchar data type. col_filter is not indexed.

Questions:

What should I check to find the potential issues with my table setup?
I am using Microsoft SQL server management studio 18, is there any tool that I can use to understand and get some recommendations to optimize the performance?
If indexing is the solution, is it possible to calculate how much extra space the index creation will occupy?

Update-1:

This table is part of other data fetch query, which contains, cte, inner joins and then using cte as base table which perform group by operations, the table which I posted in post is the base table inside cte, as it's taking long time even for count, I thought it would be good start to debug. Here is the full query with estimated execution plan and actual execution plan which runs for 18+hours.

Note: Actual execution plan was taken from the running query as shown in this SO answer

wait_info for select count(1) from table where col_filter >= 'number':

(35ms)PAGEIOLATCH_SH:dev-db:1(*)
(26ms)PAGEIOLATCH_SH:dev-db:1(*)
(86ms)PAGEIOLATCH_SH:dev-db:1(*)
(9ms)PAGEIOLATCH_SH:dev-db:1(*)

and it's actual execution plan

Any suggestions would be really appreciable.

Best Answer

As you've already been advised on (between both your Posts), an index on (col_filter) would help the example query you've provided. If you're only running aggregative queries like this, then a nonclustered columnstore index might be best so that you can get columnar compression (which will minimize disk space overhead of the index) and improve performance with batch mode operations.

Outside of that, as Martin pointed out on your other Post, 4 hours is still unusual to scan even the whole table of 1.5 million rows. But since you're on the cloud, there's a number of things that can be bottlenecking your queries. It's hard to say without seeing the actual execution plan. You can also run sp_WhoIsActive in a separate query window, while you're waiting on your query to finish executing, to see what it's waiting on (wait types) and if there's any blocking processes. This would be helpful to know too.

Start with indexing, and see if that makes a significant improvement (i.e. your example query shouldn't take more than a minute - and that's slow to be honest). If it's still problematic, please update your Post with your table's definition and the actual execution plan of the slow query, which you can upload on Paste The Plan and then link in your Post.

Related Solutions

SQL Server Execution Plan – Actual Number of Rows Too High

The number of records in TransmittedManifests table is much less than the number of records in LWTest. In such scenario, good solution is to use NOT EXISTS approach (as in Query 2) to reduce the number of actual rows. Refer Joins without JOIN - Rob Farley

Now, Martin Smith’s comment helped me demystifying the count per execution (estimate) and total count (actual). For Query 1, the ActualExecutions="904" for LWTest table. So 128385/904 = 142 is the actual rows per execution which is somewhat close to the estimate 104.

In SQL Server 2005, the ActualExecutions can be seen by making a profiler trace adding performance events as shown in the diagram below. [In SQL Server 2012, I could see this information captured in execution plan diagram itself].

XML

SQL Server 2005 Trace Settings

SQL Server – How to Get Actual Execution Plan for Cancelled Query

In SQL Server 2016 Management studio you can see the execution plan while executing, using the include live execution plan. This works for SQL Server 2014+

For SQL Server 2014 I usually use this query to get the execution plan of execution query's

SELECT
    r.session_id
,   r.start_time
,   TotalElapsedTime_ms = r.total_elapsed_time
,   r.[status]
,   r.command
,   DatabaseName = DB_Name(r.database_id)
,   r.wait_type
,   r.last_wait_type
,   r.wait_resource
,   r.cpu_time
,   r.reads
,   r.writes
,   r.logical_reads
,   t.[text] AS [executing batch]
,   SUBSTRING(
                t.[text], r.statement_start_offset / 2, 
                (   CASE WHEN r.statement_end_offset = -1 THEN DATALENGTH (t.[text]) 
                         ELSE r.statement_end_offset 
                    END - r.statement_start_offset ) / 2 
             ) AS [executing statement] 
,   p.query_plan
FROM
    sys.dm_exec_requests r
CROSS APPLY
    sys.dm_exec_sql_text(r.sql_handle) AS t
CROSS APPLY 
    sys.dm_exec_query_plan(r.plan_handle) AS p
ORDER BY 
    r.total_elapsed_time DESC;

This gets the statement executing with their corresponding estimated query plan.

Best Answer

Related Solutions

SQL Server Execution Plan – Actual Number of Rows Too High

SQL Server – How to Get Actual Execution Plan for Cancelled Query

Related Question