Sql-server – CXPACKET Waits performance tune for SQL Server 2008

parallelismsql serversql-server-2008

I have a SQL Server 2008 query that is working on millions of records. The query is w/in a proc that is run nightly by a job. The query can take a full day to run when I first put it on the server, but within a week or so it will drop down to less than one hour–without any intervention from me. It FIXES ITSELF somehow.

The query runs in tempdb and, before it fixes itself, when I examine performance stats on it I find the following:
CXPACKET: 20,700sec or 66% of wait time.
PAGEIOLATCH:_SH 2,500 or 8% of wait time.
LOGBUFFER: 1500sec or 5% of wait time
IO_COMPLETION: 1500sec or 5% of wait time

I tried to tune indexes, etc. and the stats above are somewhat of an improvement over my first run when CXPACKET was 77% of wait time. I read trouble shooting tips that said I should split my tempdb into one file for each CPU. I have a dual cpu 32 bit W2K8 system and so I split tempdb into 2 files and greatly increased the size of each to 150 GB each 10% autogrow, but they aren't growing so I think the size is sufficient.

When I looked at the server while the query was running I could see that the CPUs were NOT pinned and were hovering down around <10% of their capacity. What was pinned was DISK IO. The machine has a single disk.

Without further ado, here are the two queries causing the trouble (the first query used to be a subquery of the latter-see explanation below):

insert into #ttcbm(tradeId1, tradeId2)
select distinct tp.tradeid tradeId1, tp1.tradeid tradeId2
from #tradeP tp
join #tradeP tp1    
on tp.cmbId = tp1.cmbId
and tp.qs_plyrid = tp1.qs_plyrid    
and tp.tradeId > tp1.tradeId    
OPTION (MAXDOP 1)


insert into #mergeT(tradeId1, tradeId2)
select distinct tp.tradeid tradeId1, tp1.tradeid tradeId2
from #tradep tp
join #tradep tp1
on tp.cmbId = tp1.cmbId
and tp.tradeid > tp1.tradeId
left join #ttcbm x
on tp.tradeId = x.tradeId1
and tp1.tradeId = x.tradeId2
where 1 = 1
and x.tradeId1 is null
and x.tradeId2 is null
OPTION (MAXDOP 1);

I added MAXDOP 1 per a troubleshooting tip I read that said CXPACKET was caused by parallelism, and perhaps it did help drive down my waits a bit, but not like the improvement that happens when the query fixes itself, i.e., from 24 hours to lest than 1 hr.

the #ttcbm table has a PK of tradeid1, tradeid2 and #tradep has a pk of (cmbId, qs_plyrid, tradeid) and both tables have record counts on the order of 100K to 500k. #ttcbm used to be a subquery of the latter 'insert into #mergeT' query, but I separated it out when I read that separating out complicated queries can improve performance when parallelism is a problem.

Best Answer

There is a lot of misunderstanding about CXPACKET. CXPACKET isn't the cause of your problems, it is a side effect. What CXPACKET means when you see it is that that thread of a parallel query is waiting for another thread of that query to do something. Just because you see a lot of CXPACKET waits doesn't mean there's a problem with the query, it means that there is a problem somewhere else. When you see CXPACKET waits you need to look at the other threads of the SPID and see what other waits are besides CXPACKET. That other wait is the problem.

To your specific issue, the reason that the run times are so crazy is probably because the SQL Server is generating a different plan on some days because the statistics are out of date (making the job run long). Then you either manually update statistics (or via a job) or auto stats kicks in and then the plan gets better and the job runs quickly again.

Once you've solved the stats problem you can start looking at other reasons why the job is running slow. The fact that you only have a single disk is not helping any for sure.