Sql-server – SQL server 2012 inner join estimated number of rows issue

cardinality-estimatesexecution-plansql serversql-server-2012

I have a simple join like the one below:

select * 
from fact_sales f
join dim_company d on f.company_SK = d.company_SK

The fact table contains a little over 500 million records (with NC columnstore), the query will return all of them, however, the estimated number of rows after the join is only 300 million. Up till the hash join, the estimated number of rows are correct, only after the join it drops to 300 million. Here's the estimated plan for the query:

enter image description here

I've updated the stats on the SK column used in the join in both the fact table (with fullscan) and the dimension table, here's the histogram for each:

enter image description here

This issue only seems to occur for a couple dimension tables in the database, joining the other dimension tables doesn't produce the same type of cardinality estimate issue – any suggestions on how to possibly fix this or further investigate?

If I add a where clause to the query, it correctly estimates the number of rows before/after the join, e.g.

select * 
from fact_sales f
join dim_company d on f.company_SK = d.company_SK where company_SK = 1

will estimate 467,583,000 rows coming out of the join, which matches what's in the histogram.

The issue seems to only occur when I don't have any filter in the query. It's causing a problem in a bigger query (sort spill). I've narrowed it down to this particular join.

I do have an FK constraint, but they have been turned off (WITH NOCHECK) on the fact table (we were told to turn them off so ETL can go faster). Unfortunately, turning the FK back on is not an option 🙁

Update: enabling trace flag 2301 solved the issue :p

Best Answer

I was able to reproduce your issue in SQL Server 2014 with the legacy CE and TF 4199 enabled. I used a rowstore fact table because I have little experience with columnstore.

For your query with the filter, the query optimizer rewrites the query to something like this:

select * 
from fact_sales f
join dim_company d on f.company_SK = d.company_SK 
where f.company_SK = 1 and d.company_SK = 1

For that query the optimizer can directly uses the histogram step containing 1 for both tables and you get 467,583,000 X 1 as the estimated number of rows.

The query without a filter will use linear interpolation:

For joins with a single equality or inequality predicate, the legacy CE joins the histograms on the join columns by aligning the two histograms step-by-step using linear interpolation.

Note that the statistics for your dimension table do not have a histogram step with a RANGE_HI_KEY value of 1, which is where almost all of the data is in your fact table. Something is going wrong with the linear interpolation step for that value. If I create your dimension table without a 0 value then the histogram gets a step with a RANGE_HI_KEY value of 1. That fixes the estimate. In short, you got unlucky with your histogram. I don't think there is a supported way to force the histogram on the dimension table to contain a step for 1, unless perhaps you are willing to add additional filters to your query.

I don't know the exact rules of your data, but I can give you two workarounds. The first workaround is to enable trace flag 2301 for the query. That fixes the estimate for this part of the query but may have other negative effects for the rest of your query. This trace flag is documented by MS, but talk to your DBA before using it.

The second workaround is to encourage the query optimizer to use a different method of cardinality estimation for the join. The following query gave a good estimate for me and only does a single scan on your fact table:

select * 
from FACT_SALES f
join (SELECT COMPANY_SK FROM DIM_COMPANY UNION SELECT 1) d
on f.company_SK = d.company_SK;