Sql-server – Eliminate filter operator before columnstore index scan operator

columnstoreindex-tuningsql serversql-server-2017t-sql

I have a large fact table with millions of rows called MyLargeFactTable,
and its a clustered column store table.

There is a composite primary key constraint on it there as well
(customer_id,location_id,order_date columns).

I also have a temp table #my_keys_to_filter_MyLargeFactTable,
with the very same 3 columns,
and it contains few thousand UNIQUE combination of these 3 key values.

The following query gives me back the desired result set

...
FROM #my_keys_to_filter_MyLargeFactTable AS t
JOIN dbo.MyLargeFactTable AS m
ON m.customer_id = t.customer_id
AND m.location_id = t.location_id
AND m.order_date = t.order_date

but i notice that the Index Scan Operator on the fact table returns more rows than it should (about a million) and feed it into a Filter operator, which further reduce the result set to the desired few thousand rows.

enter image description here

Index Scan operator reads way to much rows (they quite wide rows) increasing IO, and significantly slows down the whole query.

Are my parameters not sargable?

How could I remove the Filter operator and somehow force the Index Scan operator to read only the few thousand rows?

Table definitions:

create table #my_keys_to_filter_MyLargeFactTable 
(
customer_id varchar(96) not null,
location_id varchar(96) not null,
order_date date not null,
primary key clustered (customer_id,location_id,order_date)
)

create table MyLargeFactTable
(
customer_id varchar(96) not null,
location_id varchar(96) not null,
order_date date not null,
...
lot of wide decimal typed columns, and even large varchars
...
PRIMARY KEY NONCLUSTERED  (customer_id,location_id,order_date),
INDEX cci CLUSTERED COLUMNSTORE
)

Best Answer

How could I remove the Filter operator and somehow force the Index Scan operator to read only the few thousand rows?

The Filter operator is applying the bitmap built on the join columns at the hash join.

Of the three join predicates, only order_date has a data type that is supported for bitmap pushdown to the column store scan. If you look at the Predicate on the scan you should see this as something like:

PROBE([Opt_Bitmap1005],[dbo].[MyLargeFactTable].[order_date])

The remaining join predicates are strings and so will appear at the Filter as part of the full bitmap test:

PROBE([Opt_Bitmap1005],
    [dbo].[MyLargeFactTable].[customer_id],
    [dbo].[MyLargeFactTable].[location_id],
    [dbo].[MyLargeFactTable].[order_date])

Pushing (parts of) the bitmap test down into the column store scan is an optimization that is only available for data types that can fit in 64 bits (like date in your example). Note join bitmap pushdown is different from string predicate pushdown (e.g. pushing customer_id LIKE '%XYZ%').

There are several ways you could look to work around this limitation. Redesigning the schema such that long strings are moved to a dimension table and referenced using an integer key is one option.

Somewhat less intrusively, you might be able to add an integer checksum to the column store (sadly not as a persisted computed column) and temporary table, then add that into the join - e.g an integer computed from CHECKSUM(customer_id, location_id, order_date) or the like.

There would still be a Filter, but the bitmap would include the checksum column, which could be pushed into the scan. This ought to significantly reduce the number of rows passed into the Filter.