I have a query where using
select * not only does far fewer reads, but also uses significantly less CPU time than using
This is the query:
select top 1000 c.ID from ATable a join BTable b on b.OrderKey = a.OrderKey and b.ClientId = a.ClientId join CTable c on c.OrderId = b.OrderId and c.ShipKey = a.ShipKey where (a.NextAnalysisDate is null or a.NextAnalysisDate < @dateCutOff) and b.IsVoided = 0 and c.ComplianceStatus in (3, 5) and c.ShipmentStatus in (1, 5, 6) order by a.LastAnalyzedDate
This finished with 2,473,658 logical reads, mostly in Table B. It used 26,562 CPU and had a duration of 7,965.
This is the query plan generated:
On PasteThePlan: https://www.brentozar.com/pastetheplan/?id=BJAp2mQIQ
When I change
*, the query finished with 107,049 logical reads, fairly evenly spread between all three tables. It used 4,266 CPU and had a duration of 1,147.
This is the query plan generated:
On PasteThePlan: https://www.brentozar.com/pastetheplan/?id=SyZYn7QUQ
I attempted to use the query hints suggested by Joe Obbish, with these results:
select c.ID without hint: https://www.brentozar.com/pastetheplan/?id=SJfBdOELm
select c.ID with hint: https://www.brentozar.com/pastetheplan/?id=B1W___N87
select * without hint: https://www.brentozar.com/pastetheplan/?id=HJ6qddEIm
select * with hint: https://www.brentozar.com/pastetheplan/?id=rJhhudNIQ
OPTION(LOOP JOIN) hint with
select c.ID did drastically reduced the number of reads compared to the version without the hint, but it is still doing about 4x the number of reads the
select * query without any hints. Adding
OPTION(RECOMPILE, HASH JOIN) to the
select * query made it perform much worse than anything else I have tried.
After updating statistics on the tables and their indexes using
WITH FULLSCAN, the
select c.ID query is running much faster:
select c.ID before update: https://www.brentozar.com/pastetheplan/?id=SkiYoOEUm
select * before update: https://www.brentozar.com/pastetheplan/?id=ryrvodEUX
select c.ID after update: https://www.brentozar.com/pastetheplan/?id=B1MRoO487
select * after update: https://www.brentozar.com/pastetheplan/?id=Hk7si_V8m
select * still outperforms
select c.ID in terms of total duration and total reads (
select * has about half the reads) but it does use more CPU. Overall they are much much closer than before the update, however the plans still differ.
The same behavior is seen on 2016 running in 2014 Compatibility mode and on 2014. What could explain the disparity between the two plans? Could it be that the "correct" indexes have not been created? Could statistics being slightly out of date cause this?
I tried moving the predicates up to the
ON part of the join, in multiple ways, but the query plan is the same each time.
After Index Rebuilds
I rebuilt all of the indexes on the three tables involved in the query.
c.ID is still doing the most reads (over twice as many as
*), but CPU usage is about half of the
* version. The
c.ID version also spilled into tempdb on the sorting of
I also tried forcing it to operate without parallelism, and that gave me the best performing query: https://www.brentozar.com/pastetheplan/?id=SJn9-vuLX
I notice the execution count of operators AFTER the big index seek that is doing the ordering only executed 1,000 times in the single-threaded version, but did significantly more in the Parallelized version, between 2,622 and 4,315 executions of various operators.