Sql-server – Why is selecting all resulting columns of this query faster than selecting the one column I care about

execution-planperformancequery-performancesql serversql server 2014

I have a query where using select * not only does far fewer reads, but also uses significantly less CPU time than using select c.Foo.

This is the query:

select top 1000 c.ID
from ATable a
    join BTable b on b.OrderKey = a.OrderKey and b.ClientId = a.ClientId
    join CTable c on c.OrderId = b.OrderId and c.ShipKey = a.ShipKey
where (a.NextAnalysisDate is null or a.NextAnalysisDate < @dateCutOff)
    and b.IsVoided = 0
    and c.ComplianceStatus in (3, 5)
    and c.ShipmentStatus in (1, 5, 6)
order by a.LastAnalyzedDate

This finished with 2,473,658 logical reads, mostly in Table B. It used 26,562 CPU and had a duration of 7,965.

This is the query plan generated:

Plan from Selecting a single column's value
On PasteThePlan: https://www.brentozar.com/pastetheplan/?id=BJAp2mQIQ

When I change c.ID to *, the query finished with 107,049 logical reads, fairly evenly spread between all three tables. It used 4,266 CPU and had a duration of 1,147.

This is the query plan generated:

Plan from Selecting all values
On PasteThePlan: https://www.brentozar.com/pastetheplan/?id=SyZYn7QUQ

I attempted to use the query hints suggested by Joe Obbish, with these results:
select c.ID without hint: https://www.brentozar.com/pastetheplan/?id=SJfBdOELm
select c.ID with hint: https://www.brentozar.com/pastetheplan/?id=B1W___N87
select * without hint: https://www.brentozar.com/pastetheplan/?id=HJ6qddEIm
select * with hint: https://www.brentozar.com/pastetheplan/?id=rJhhudNIQ

Using the OPTION(LOOP JOIN) hint with select c.ID did drastically reduced the number of reads compared to the version without the hint, but it is still doing about 4x the number of reads the select * query without any hints. Adding OPTION(RECOMPILE, HASH JOIN) to the select * query made it perform much worse than anything else I have tried.

After updating statistics on the tables and their indexes using WITH FULLSCAN, the select c.ID query is running much faster:
select c.ID before update: https://www.brentozar.com/pastetheplan/?id=SkiYoOEUm
select * before update: https://www.brentozar.com/pastetheplan/?id=ryrvodEUX
select c.ID after update: https://www.brentozar.com/pastetheplan/?id=B1MRoO487
select * after update: https://www.brentozar.com/pastetheplan/?id=Hk7si_V8m

select * still outperforms select c.ID in terms of total duration and total reads (select * has about half the reads) but it does use more CPU. Overall they are much much closer than before the update, however the plans still differ.

The same behavior is seen on 2016 running in 2014 Compatibility mode and on 2014. What could explain the disparity between the two plans? Could it be that the "correct" indexes have not been created? Could statistics being slightly out of date cause this?

I tried moving the predicates up to the ON part of the join, in multiple ways, but the query plan is the same each time.

After Index Rebuilds

I rebuilt all of the indexes on the three tables involved in the query. c.ID is still doing the most reads (over twice as many as *), but CPU usage is about half of the * version. The c.ID version also spilled into tempdb on the sorting of ATable:
c.ID: https://www.brentozar.com/pastetheplan/?id=HyHIeDO87
*: https://www.brentozar.com/pastetheplan/?id=rJ4deDOIQ

I also tried forcing it to operate without parallelism, and that gave me the best performing query: https://www.brentozar.com/pastetheplan/?id=SJn9-vuLX

I notice the execution count of operators AFTER the big index seek that is doing the ordering only executed 1,000 times in the single-threaded version, but did significantly more in the Parallelized version, between 2,622 and 4,315 executions of various operators.

Best Answer

It's true that selecting more columns implies that SQL Server may need to work harder to get the requested results of the query. If the query optimizer was able to come up with the perfect query plan for both queries then it would be reasonable to expect the SELECT * query to run longer than the query that selects all columns from all tables. You have observed the opposite for your pair of queries. You need to be careful when comparing costs, but the slow query has a total estimated cost of 1090.08 optimizer units and the fast query has a total estimated cost of 6823.11 optimizer units. In this case, it could be said that the optimizer does a poor job with estimating total query costs. It did pick a different plan for your SELECT * query and it expected that plan to be more expensive, but that wasn't the case here. That type of mismatch can happen for many reasons and one of the most common causes is cardinality estimate problems. Operator costs are largely determined by cardinality estimates. If a cardinality estimate at a key point in a plan is inaccurate then the total cost of the plan may not reflect reality. This is a gross oversimplification but I hope that it will be helpful for understanding what's going on here.

Let's start by discussing why a SELECT * query might be more expensive than selecting a single column. The SELECT * query may turn some covering indexes into noncovering indexes, which might mean that the optimizer needs to do addition work to get all of the columns it needs or it may need to read from a larger index. SELECT * may also result in larger intermediate result sets which need to be processed during query execution. You can see this in action by looking at the estimated row sizes in both queries. In the fast query your row sizes range from 664 bytes to 3019 bytes. In the slow query your row sizes range from 19 to 36 bytes. Blocking operators such as sorts or hash builds will have higher costs for data with a larger row size because SQL Server knows it's more expensive to sort larger amounts of data or to turn it into a hash table.

Looking at the fast query, the optimizer estimates that it needs to do 2.4 million index seeks on Database1.Schema1.Object5.Index3. That is where most of the plan cost comes from. Yet the actual plan reveals that only 1332 index seeks were done on that operator. If you compare the actual to the estimated rows for the outer parts of those loop joins you'll see large differences. The optimizer thinks that many more index seeks will be needed to find the first 1000 rows needed for the query's results. That's why the query has a relatively high cost plan but finishes so quickly: the operator that was predicted to be the most expensive did less than 0.1% of its expected work.

Looking at the slow query, you get a plan with mostly hash joins (I believe the loop join is there just to deal with the local variable). Cardinality estimates definitely aren't perfect, but the only real estimate problem is right at the end with the sort. I suspect most of the time is spent on the scans of the tables with hundreds of millions of rows.

You may find it helpful to add query hints to both versions of the query to force the query plan associated with the other version. Query hints can be a good tool to figure out why the optimizer made some of its choices. If you add OPTION (RECOMPILE, HASH JOIN) to the SELECT * query I expect you'll see a similar query plan to the hash join query. I also expect that query costs will be much higher for the hash join plan because your row sizes are much bigger. So that could be why the hash join query wasn't chosen for the SELECT * query. If you add OPTION (LOOP JOIN) to the query that selects just one column I expect you'll see a query plan similar to the one for the SELECT * query. In this case, reducing the row size shouldn't have much of an impact on the overall query cost. You might skip the key lookups but that's a small percentage of the estimated cost.

In summary, I expect that the larger row sizes needed to satisfy the SELECT * query push the optimizer towards a loop join plan instead of a hash join plan. The loop join plan is costed higher than it should be due to cardinality estimate issues. Reducing the row sizes by selecting just one column greatly reduces the cost of a hash join plan but probably won't have much of an effect on the cost for a loop join plan, so you end up with the less efficient hash join plan. It's hard to say more than this for an anonymized plan.