AFAIK columnstore indexes store min/max values for segments (1 million rows), so this information can be used to prune not needed data.
This feature seems to work with equality predicates, but not with more complex cases like joins or subquery filters.
In SQL Server 2014, is it possible to use columnstore metadata filtering with joins or subqueries?
Below is my test case:
--create table generate_series with 67_108_864 rows (runs about 5 mins)
set nocount on;
create table generate_series(c bigint);
insert into generate_series select 1;
declare @i int = 26;
while @i != 0
begin
insert into generate_series WITH (TABLOCK)
select
(select count(*) from generate_series)
+ row_number() over (order by (select null)) as c
from generate_series;
set @i = @i - 1;
end;
create nonclustered columnstore index cstore on generate_series(c);
--Test queries
-- 0 seconds (very fast)
select * from generate_series where c = 100;
-- 8 seconds (slow)
select 100 as c into #tmp;
select * from generate_series where c in (select c from #tmp);
Best Answer
The performance optimization that you are describing is known as rowgroup elimination. You can get rowgroup elimination with both queries in both SQL Server 2016 and SQL Server 2014 but SQL Server 2014 has additional restrictions.
Let's test in SQL Server 2016 first. The following query finishes very quickly:
SQL Server 2016 offers additional diagnostic information about rowgroup elimination through
SET STATISTICS IO ON;
so let's take advantage of that:This query is also fast:
Statistics output:
This query is fast as well:
For each query SQL Server is able to skip 68 rowgroups. It only reads data from the rowgroup that contains an ID of 100. In the plan for the final query we can see that an optimized bitmap was applied:
That bitmap allows rowgroup elimination through a join.
In SQL Server 2014 I get similar performance results for the queries except for the last one that has a join and
MAXDOP
of 1. That one takes 21 seconds to run. Here is the query plan:There is no bitmap and the scan isn't running in batch mode. We send back all 67 million rows to the hash join.
In SQL Server 2014 serial plans are not eligible for batch mode. If your query is running in serial for some reason that could explain the poor performance that you experienced. Note that the query that directly filters against
c = 100
also runs in row mode but it still finishes quickly.