Sql-server – What’s a pathological case where a bitmap filter would not allow the PROBE(Field, IN-ROW) semijoin reduction optimization

performancequery-performancesql serversql-server-2017

I recently learned a performance optimization in SQL Server I did not know about ("Bitmap Filter On (Clustered) Non-Nullable Column Hash Semi-join Reduction via Scan + Probe IN-ROW" – a real mouthful), and am looking for data sets that can defeat my optimization, as this same query will be used in a nearly identical database schema in a different database with different customer use cases.

Specify OPTION(Hash Join, MAXDOP = 8)
1. MAXDOP effectively tells the query optimizer to look for plans that can exploit parallelism
2. HASH JOIN is what tells the query optimizer to consider a Bitmap Filter
Non-nullable column filtered on EXISTS clause
1. When the column is non-nullable, the semi-join (EXISTS clause) guides the optimizer to choose a clustered index scan with a predicate that says something like PROBE(NonNullableColumn, 'IN-ROW')
2. Combined with the bitmap filter, the PROBE is highly efficient because it uses a bitmap filter. This is, in effect, equivalent to a perfect Index Seek. It's like a bookmark lookup on steroids.

However, in thinking through this cleverly constructed query I just made, I'm a little scared I don't know how to break it and when to expect it to stop working and the cost-based optimizer will choose a different plan.

Suppose I have 3000 unique values for NonNullableColumn, each with 30 million rows binned to those unique values. In my performance tests so far, I passed in a set containing 15 as a unique test value. This works great and I'm pretty confident this is the optimal plan. What happens if I pass in a set containing random numbers – will SQL Server stop using the bitmap filter? What's the "worst case" scenario here?

I've attached a picture of the query plan to re-inforce what I'm trying to test and how I can defeat (or anticipate) Murphy's Law.

In thinking through "what could go wrong", I found this in SQL Server Docs:

The Repartition Streams operator (or exchange iterator) consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. If the query optimizer uses a bitmap filter, the number of rows in the output stream is reduced.

and:

SQL Server uses the Bitmap operator to implement bitmap filtering in parallel query plans. Bitmap filtering speeds up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as the Parallelism operator. A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. By removing unnecessary rows early in the query, subsequent operators have fewer rows to work with, and the overall performance of the query improves. The optimizer determines when a bitmap is selective enough to be useful and in which operators to apply the filter. Bitmap is a physical operator.

Showplan Logical and Physical Operators Reference

According to Bitmap Filter (Star Join Query Optimization):

Bitmap filter doesn’t work if:

It’s not a hash join

Out of date stats/missing indexes (cardinality/rows)

Not enough memory

Not enough threads (BF only works in parallel query – you need more than 1 CPU)

In my case, 1, 2 and 4 are not problems. 3 is evidently not a problem, at least in my example test. The memory requirement essentially means my StarJoinInfo table (Build table) must be smaller enough to fit in main memory; I wish I had a formula that gave me a upper bounds on how many unique values I could create a BitMap for.

Best Answer

I figured out the answer to my question - sort of.

When you specify MAXDOP(8), the optimized bitmap filter optimization will create 8 hash buckets. The exact spread of integers across those hash buckets is probably Pearson hashing. I am curious if anyone in the comments knows the exact hashing technique, though.

Quoting: https://sqlserverfast.com/blog/hugo/2018/05/plansplaining-part-5-bitmaps/

Simple example: Let’s say that only three stores in our database have less than 30 employees. These stores have StoreKey 17, 38, and 82. The first row the Bitmap operator receives is for StoreKey 17; it hashes the number 17 and the result is 29, so bit 29 in the bitmap is set to one. It then reads the next row, for StoreKey 38; hashes the number 38 which results in 15; and sets bit 15 to one. The third row it reads is for StoreKey 82; due to a hash collision this number also hashes to 29 so it will once more set bit 29 to one (effectively not changing anything). There are no more rows so the end result will be a bitmap with bits 15 and 29 set to one and all other bits set to zero.

Related Solutions

Sql-server – Inaccurate ‘Actual’ row counts in parallel plan

Rows are passed across exchanges internally from producer to consumer thread in packets (hence CXPACKET - class exchange packet), rather than a row-at-a-time. There is a certain amount of buffering inside the exchange. Also, the call to shut down the pipeline from the consumer side of the Gather Streams has to be passed in a control packet back to the producer threads. Scheduling and other internal considerations mean that parallel plans always have a certain 'stopping distance'.

As a consequence, you will often see this sort of row count difference where less than the whole potential rowset of a sub-tree is actually required. In this case, the TOP brings execution to an 'early end'.

More information:

Sql-server – Finding distinct rows across two tables: Full Outer Join more efficient than Union

The semantics of the two queries are not the same - UNION removes duplicates, whereas the FULL OUTER JOIN will not:

DECLARE @T1 AS table (id bigint NULL, val integer NULL);
DECLARE @T2 AS table (id bigint NULL, val integer NULL);

INSERT @T1 (id, val) VALUES (1, 1);
INSERT @T1 (id, val) VALUES (1, 1);
INSERT @T2 (id, val) VALUES (1, 1);
INSERT @T2 (id, val) VALUES (1, 1);

SELECT COALESCE(t1.id, t2.id) AS id, COALESCE(t1.val, t2.val) AS val
FROM @t1 t1
FULL OUTER JOIN @t2 t2
    ON t2.id = t1.id
    AND t2.val = t1.val;

SELECT t1.id, t1.val
FROM @t1 t1
UNION 
SELECT t2.id, t2.val
FROM @t2 t2;

Output:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║   1 ║
║  1 ║   1 ║
║  1 ║   1 ║
║  1 ║   1 ║
╚════╩═════╝

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║   1 ║
╚════╩═════╝

That said, the optimizer does not know many FOJN tricks, so it is always possible that there is a better way to express the query than the natural UNION. Only commonly-useful and always-correct transformations are implemented.

Note that with a unique constraint only on the larger table, the optimizer chooses a hash union, without expensive duplicate-removal on the probe input, that makes it choose Concat Union All in the question example:

ALTER TABLE #t2 
ADD CONSTRAINT UQ2 
UNIQUE CLUSTERED (id);

SELECT COUNT(*), AVG(x.id), AVG(x.val)
FROM (
    SELECT t1.id, t1.val
    FROM #t1 t1
    UNION
    SELECT t2.id, t2.val
    FROM #t2 t2
) AS x;

The FOJN rewrite may well be a useful one in cases where you know there cannot be duplicates within each input set, but this condition is not enforced with a unique constraint or index (particularly on the large input).

If such a uniqueness guarantee does exist, and yet the optimizer does not select a Hash Union, you might try an OPTION (HASH UNION) hint, to see how it compares.

Best Answer

Related Solutions

Sql-server – Inaccurate ‘Actual’ row counts in parallel plan

Sql-server – Finding distinct rows across two tables: Full Outer Join more efficient than Union

Related Question