Sql-server – Query cycles between suspended and runnable without ever completing

blockingquery-performancesql serversql-server-2012

I have an insert statement, extracting data through multiple inner joins from around 8 tables. It is getting into suspended state then runnable and vice versa forever; never coming back to running state.

Here is the estimated execution plan: https://www.brentozar.com/pastetheplan/?id=Sy6gPl-6L

While running the trace on this particular process, I see, it holds lock on tempdb and also on one user database but not running any statements on them. Simply its state is changing in between suspended & runnable. When I kill and re-run, it is getting executed normally. I see there are no processes or jobs conflicting with this insert statement.

The wait types are "io_completion" and "sos_scheduler_yield" while the query got stuck. I have used sys.syprocesses and sp_who2 active also dm_exec_requests to monitor. I have run trace while the process is running, once the process went into suspended state showing up "io_completion" to "sos_scheduler_yield" and vice versa.

I am not able to figure out why it is happening. Could you please put some light on it and advise any solutions.

Best Answer

Explanation of behavior

Some of the causes of the IO_COMPLETION wait type are:

Writing intermediate sort buffers to disk (these are called ‘Bobs’)

Reading and writing sort results from/to disk during a sort spill

There are two sorts that could be spilling, which uses tempdb.

A source of the slowness could also be one the unfortunate "many to many merge join" in the middle of the execution plan (which also uses tempdb, although it doesn't cause IO_COMPLETION waits):

The estimated plan shows ~2 GB of data coming out of that merge join - and that amount of data could be even higher if the estimates are off.

You mentioned the problem is intermittent, which could be because of tempdb contention (if other queries are running at the time).

Suggestions for improvement

Temp table rewrite

The best option I can think of would be to break the query up into smaller chunks. For instance, you could just select the trans_header rows that meet your where clause into a #temp table. Then use that temp table in the main query instead of trans_header.

This could improve estimates and let the optimizer produce a better plan - potentially avoiding any spills or other tempdb activity.

That, generally, looks like this. Sorry if there are any slight typos, it's tough not having intellisense ?

SELECT 
    a.[term_id],
    a.[trans_ref_no],
    a.supplier_no,
    a.cust_no,
    a.trans_id,
    a.carrier,
    a.folio_yr, 
    a.folio_mo, 
    a.folio_no,
    a.transaction_date
INTO #trans_header_temp
FROM [TOPHAT].[trans_header] AS a with (nolock)
WHERE
    a.folio_yr=year(getdate())-1
    AND a.transaction_date <>' 00000';

INSERT INTO [TOPHAT].[terminal_volume]
           ([term_id]
           ,[terminal_name]
           ,[folio_yr]
           ,[folio_mo]
           ,[folio_no]
     ,[folio_year_month]
           ,[product_id]
           ,[prod_name]
           ,[product_id_name]
           ,[supplier_no]
           ,[supplier_name]
           ,[supplier_no_name]
           ,[customer_no]
           ,[customer_name]
     ,[carrier_no]
     ,[carrier_name]
           ,[bbl]
           ,[gallon]
           ,[customer_no_name]
           ,[trans_code]
           ,[transaction_description]
           ,[transaction_code_description]
           ,[transaction_year]
           ,[transaction_month]
           ,[transaction_date]
           ,[getdate])
select a.[term_id],
 tp.name as terminal_name,
 a.folio_yr, 
 a.folio_mo, 
 a.folio_no,
 cast(a.folio_yr + '-' + a.folio_mo +'-01' as date) as folio_year_month,
 p.prod_id as product_id, 
 p.prod_name as prod_name,
 p.prod_id+' '+p.prod_name  as product_id_name, 
 s.supplier_no,
 s.supplier_name,
 s.supplier_no+' '+s.supplier_name as supplier_no_name,
 cu.cust_no as customer_no, 
 cu.cust_name as customer_name, 
 a.carrier as carrier_no, 
 ca.name as carrier_name,
 sum((convert(decimal(12,2),b.net)*(case when b.sign is null then 1 else -1 end))/42) as 'bbl',
 sum(convert(decimal(12,2),b.net)*(case when b.sign is null then 1 else -1 end)) as gallon,
 cu.cust_no+' '+ cu.cust_name as customer_no_name,
 tv.trans_code, 
 tv.trans_desc as transaction_description, 
 tv.trans_code+' '+tv.trans_desc as transaction_code_description, 
 '20'+ left(a.transaction_date, 2) as transaction_year, 
 substring(a.transaction_date, 3, 2) as transaction_month,
  case when isdate(a.transaction_date) = 0 then null
   else 
   cast(a.transaction_date as date) end as transaction_date, 
 cast(getdate() as date)  as [getdate]
   from
#trans_header_temp as a with (nolock) 
 inner join [TOPHAT].[terminal_profile] as tp  with (nolock) 
  on a.term_id = tp.term_id
 inner join [TOPHAT].[trans_products] as b with (nolock) 
  on a.[trans_ref_no] = b.[trans_ref_no] 
 inner join [TOPHAT].[product] as p with (nolock) 
  on p.term_id = a.term_id
  and p.prod_id = b.prod_id
 inner join TOPHAT.supplier as s with (nolock) 
  on s.supplier_no = a.supplier_no
 inner join TOPHAT.customer as cu with (nolock) 
  on cu.cust_no = a.cust_no 
  and cu.supplier_no = a.supplier_no
 inner join TOPHAT.trans_value as tv with (nolock) 
  on tv.trans_code = a.trans_id
 inner join [TOPHAT].[carrier] as ca with (nolock)
  on ca.term_id = a.term_id
  and ca.carr_no = a.carrier

group by 
a.[term_id],
tp.name ,
a.folio_yr, 
a.folio_mo, 
a.folio_no,
p.prod_id,
p.prod_name,
p.prod_id+' '+p.prod_name,
s.supplier_no,
s.supplier_name,
s.supplier_no+' '+s.supplier_name,
cu.cust_no, 
cu.cust_name, 
cu.cust_no+' '+ cu.cust_name,
a.carrier, 
ca.name,
tv.trans_code,
tv.trans_desc,
tv.trans_code+' '+tv.trans_desc,
'20'+ left(a.transaction_date, 2),
substring(a.transaction_date, 3, 2) ,
a.transaction_date,
cast(a.folio_yr + '-' + a.folio_mo +'-01' as date)

Fix implicit conversions

Speaking of estimates, you have several implicit conversion warnings. It's difficult to advise on how to deal with them, since we don't have table and index definitions, but you should review them to see if they can be avoided. Especially the ones in the WHERE clause:

Join hint

One low-effort approach would be to try and avoid that specific merge join with a join hint. This might cause performance to get worse, though, because it will force the order of the joins as written, which limits the optimizer can do quite a bit:

 inner HASH join [TOPHAT].[product] as p with (nolock) 
  on p.term_id = a.term_id
  and p.prod_id = b.prod_id

Note that this wouldn't likely help with sort spills.

Related Solutions

Sql-server – Need help with long running query

I followed some 'basic query tuning' steps as explained in this article: http://www.simple-talk.com/sql/performance/simple-query-tuning-with-statistics-io-and-execution-plans/

I used sp_whoisactive, SET STATISTICS IO ON to find where the reads were happening and then added indexes based on the explain plan.

This resulted in adding covering indexes to each work table. A couple queries are taking about 2 seconds, but the majority are sub second and sub tenth of a sec.

Sql-server – Specific TempDB insert of UserDB select results in SOS_SCHEDULER_YIELD to ENCRYPTION_SCAN

You won't only see ENCRYPTION_SCAN resource in your wait list when Encryption (like TDE) is used.

Certain operations will take a shared lock on this resource to make sure the database is not being encrypted during the operation.

The moment you would encrypt a user database with TDE, the tempdb will also be encrypted (otherwise, you would have security risk when User data is used in temp db).

Therefore, some operations will take a shared lock on ENCRYPTION_SCAN in Tempdb to prevent Tempdb from getting encrypted.

Here are two examples:

BULK INSERT

IF object_id('tempdb..##NumberCreation') IS NOT NULL
    drop table ##NumberCreation
GO

--create temp table to hold numbers
create table ##NumberCreation (C int NOT NULL);
GO

-- CREATE Numbers by using trick from Itzik -> http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers 
WITH L1 AS ( SELECT 1 as C UNION SELECT 0 ),
    L2 AS ( SELECT 1 as C FROM L1 CROSS JOIN L1 as B ),
    L3 AS ( SELECT 1 as C FROM L2 CROSS JOIN L2 as B ),
    L4 AS ( SELECT 1 as C FROM L3 CROSS JOIN L3 as B ),
    L5 AS ( SELECT 1 as C FROM L4 CROSS JOIN L4 as B ),
    L6 AS ( SELECT 1 as C FROM L5 CROSS JOIN L5 as B),
    Nums as (SELECT ROW_NUMBER() OVER (ORDER BY C) as C FROM L6) 
insert ##NumberCreation(C)
SELECT TOP 500000 C
FROM Nums

The above code will generate 500k records in a global temp table, you can export these with the following commands. If you run this from SSMS, make sure you are in SQLCMD mode:

--Export
!!bcp ##NumberCreation out "E:\SQLServer\Backup\test\export.dat" -T -n

--format file
!!bcp ##NumberCreation format nul -T -n  -f "E:\SQLServer\Backup\test\export.fmt"

Make sure to choose a directory where SQL Server service account has write permissions and if you run this from SSMS, run it locally on the SQL Server.

Next thing is to start a bulk insert loop. While the loop is running, open a second screen and start running sp_lock untill you see the ENCRYPTION_SCAN shared lock in DB_ID 2 (Which is Tempdb).

The bulk import loop:

BEGIN
    IF OBJECT_ID('tempdb..#Import') IS NOT NULL
        DROP TABLE #Import ;

    CREATE TABLE #Import (C INT) ;
    BULK INSERT #Import
    FROM 'E:\SQLServer\Backup\test\export.dat' WITH (FORMATFILE='E:\SQLServer\Backup\test\export.fmt', FIRSTROW=1, TABLOCK) ;
END
GO 500 --run it 500 times

See the result of sp_lock in second window:

enter image description here

SORT IN TEMPDB

With the same Temp table in place start this very simple loop:

SELECT * from #Import order by C
go 50

It will produce the following Execution plan:

enter image description here

(Make sure that #Import is actually populated, since depending on when you stopped the previous bulk import loop, it could be empty!)

Again, run sp_lock in a second window until you see the ENCRYPTION_SCAN Resource popping up:

enter image description here

Now you know, why this resource wait is showing up. It could be very well that this is not your problem. I'd just wanted to point out the other reasons that make ENCRYPTION_SCAN show up. The reason for your query slowdown might be something else. I'll leave improving your query plan up to the query plan experts on this site ;-) However, could you post the actual execution plan as well instead of just the estimated plan?

Best Answer

Explanation of behavior

Suggestions for improvement

Temp table rewrite

Fix implicit conversions

Join hint

Related Solutions

Sql-server – Need help with long running query

Sql-server – Specific TempDB insert of UserDB select results in SOS_SCHEDULER_YIELD to ENCRYPTION_SCAN

Related Question