Sql-server – What are thresholds in SQL Server 2014 “big table join to small table” cardinality estimation optimization

cardinality-estimatessql serversql server 2014

The SQL Server 2014 Cardinality Estimator white paper says:

The new CE, however, uses a simpler algorithm that assumes that there is a one-to-many join association between a large table and a small table. This assumes that each row in the large table matches exactly one row in the small table. This algorithm returns the estimated size of the larger input as the join cardinality.

But it doesn't say how SQL Server determines what is a "large table" and "small table" for purposes of this optimization.

Are these criteria documented anywhere? Is it a simple threshold (e.g. "small table" must be under 10,000 rows), a percentage (e.g. "small table" must be <5% of rows in the "large table"), or some more complicated function?

Also, is there a trace flag or query hint that forces use of this optimization for a particular join?

Finally, does this optimization have a name that I can use for further Googling?

I'm asking because I want this "use the cardinality of the large table" cardinality estimation behavior in a join of master/detail tables, but my "small table" (master) is 1M rows and my "big table" (detail) is 22M rows. So I'm trying to learn more about this optimization to see if I can adjust my queries to force use of it.

Best Answer

The whitepaper does not define "large" for any examples, it uses the terms "large" and "small" to help explain the math the new CE is doing compared to the legacy CE.

The section you referenced shows a join predicate that contains a mix of equality and inequality predicates. The new CE will look at the rowcounts for the tables and determine which one is "large", and use that for the estimate. The legacy CE didn't look at rowcounts, it would just multiply the selectivity of each predicate.

HTH