Sql-server – Can adding a columnstore index to a table affect read performance of a query that uses a rowstore index on the same table

columnstoreindexnonclustered-indexsql serversql-server-2016

I'm doing some testing of columnstore indexing on a single table that has about 500 million rows.
The performance gains on aggregate queries have been awesome (a query that previously took about 2 minutes to run now runs in 0 seconds to aggregate the entire table).

But I also noticed another test query that leverages seeking on an existing rowstore index on the same table is now running 4x as slow as it previously did before creating the columnstore index. I can repeatedly demonstrate when dropping the columnstore index the rowstore query runs in 5 seconds, and by adding back in the columnstore index the rowstore query runs in 20 seconds.

I'm keeping an eye on the actual execution plan for the rowstore index query, and it's almost exactly the same in both cases, regardless if the columnstore index exists. (It uses the rowstore index in both cases.)

The rowstore test query is:

SELECT *
INTO #TEMP
FROM Table1 WITH (FORCESEEK)
WHERE IntField1 = 571
    AND DateField1 >= '6/01/2020'

The rowstore index used in this query is: CREATE NONCLUSTERED INDEX IX_Table1_1 ON Table1 (IntField1, DateField1) INCLUDE (IntField2)

The columnstore test query is:

SELECT COUNT(DISTINCT IntField2) AS IntField2_UniqueCount, COUNT(1) AS RowCount
FROM Table1
WHERE IntField1 = 571 -- Some other test columnstore queries also don't use any WHERE predicates on this table
    AND DateField1 >= '1/1/2019'

The columnstore index is: CREATE NONCLUSTERED COLUMNSTORE INDEX IX_Table1_2 ON Table1 (IntField2, IntField1, DateField1)

Here is the execution plan for the rowstore index query before I create the columnstore index:

Here is the execution plan for the rowstore index query after I create the columnstore index:

The only differences I notice between the two plans is the Sort operation's warning goes away after creating the columnstore index, and the Key Lookup and Table Insert (#TEMP) operators take significantly longer.

Here is the Sort operation's info with the warning (before creating the columnstore index):

Here's the Sort operation's info without the warning (after creating the columnstore index):

I would've thought a read query that is specifically leveraging the same rowstore index and execution plan in both cases should have roughly the same performance on every run, regardless of what other indexes exist on that table. What gives here?

Edit: Here's the TIME and IO stats before creating the index:

Here's the stats after creating the columnstore index:

Best Answer

Adding the nonclustered columnstore index allows for a batch mode sort in the second execution plan. This causes all of the processing to be done on one thread - so even though the query has a parallel plan, it's essentially running serially. You can see that by looking at the details of the different operators.

I reproduced your problem locally, here's the sort operator with per-thread counts - as you can see, everything is on thread 1:

Notice the "Actual Execution Mode" is "Batch."

Everything after the sort (the nested loops join, key lookup, etc) is essentially serial, which is what slows the query down.

See this KB article for details and possible solutions:

Adds trace flag 9358 to disable batch mode sort operations in a complex parallel query in SQL Server 2016

Batch mode sorts were introduced in SQL Server 2016 under compatibility level 130. If a query execution plan contains parallel batch mode sorts in conjunction with directly-upstream parallel operators, you may encounter degraded performance compared to row mode sort plan equivalents.

This occurs due to a parallel batch sort outputting fully sorted data via a single thread to the upstream parallel operator (for example, a parallel merge join operator). The performance degradation occurs when the upstream parallel operator uses single-threaded processing due to the incoming single-threaded batch mode sort operator.

For completeness, the options outlined there are either:

enable TF 9358
enable query optimizer hotfixes (through TF 4199, the QUERY_OPTIMIZER_HOTFIXES database option, or the ENABLE_QUERY_OPTIMIZER_HOTFIXES query hint)

Getting rid of the sort is another solution for this problem. The sort is only present to try and prevent too much random I/O from the nested loops join, which is using unordered prefetch, as mentioned in this article by Craig Freedman:

Optimizing I/O Performance by Sorting – Part 1

The plan uses the non-clustered index to avoid unnecessarily touching many rows. Yet, performing 64,000 random I/Os is still rather expensive so SQL Server adds a sort. By sorting the rows on the clustered index key, SQL Server transforms the random I/Os into sequential I/Os.

You can get rid of the sort by:

eliminating the need for the key lookup (by selecting less columns, or creating a covering nonclustered index)
disabling nested loops prefetching by adding (undocumented, unsupported trace flag) OPTION (QUERYTRACEON 9115) to the query

Related Solutions

Sql-server – What exactly can SQL Server 2014 execute in batch mode

What exactly can run in batch mode as of SQL Server 2014?

SQL Server 2014 adds the following to the original list of batch mode operators:

Hash Outer join (including full join)
Hash Semi Join
Hash Anti Semi Join
Union All (Concatenation only)
Scalar hash aggregate (no group by)
Batch Hash Table Build removed

It seems that data can transition into batch mode even if it does not originate from a columnstore index.

SQL Server 2012 was very limited in its use of batch operators. Batch mode plans had a fixed shape, relied on heuristics, and could not restart batch mode once a transition to row-mode processing had been made.

SQL Server 2014 adds the execution mode (batch or row) to the query optimizer's general property framework, meaning it can consider transitioning into and out of batch mode at any point in the plan. Transitions are implemented by invisible execution mode adapters in the plan. These adapters have a cost associated with them to limit the number of transitions introduced during optimization. This new flexible model is known as Mixed Mode Execution.

The execution mode adapters can be seen in the optimizer's output (though sadly not in user-visible execution plans) with undocumented TF 8607. For example, the following was captured for a query counting rows in a row store:

Row to Batch to Row adapters

Is using a columnstore index a formal requirement that is necessary to make SQL Server consider batch mode?

It is today, yes. One possible reason for this restriction is that it naturally constrains batch mode processing to Enterprise Edition.

Could we maybe add a zero row dummy table with a columnstore index to induce batch mode?

Yes, this works. I have also seen people cross-joining with a single-row clustered columnstore index for just this reason. The suggestion you made in the comments to left join to a dummy columnstore table on false is terrific.

-- Demo the technique (no performance advantage in this case)
--
-- Row mode everywhere
SELECT COUNT_BIG(*) FROM dbo.FactOnlineSales AS FOS;
GO
-- Dummy columnstore table
CREATE TABLE dbo.Dummy (c1 int NULL);
CREATE CLUSTERED COLUMNSTORE INDEX c ON dbo.Dummy;
GO
-- Batch mode for the partial aggregate
SELECT COUNT_BIG(*) 
FROM dbo.FactOnlineSales AS FOS
LEFT OUTER JOIN dbo.Dummy AS D ON 0 = 1;

Plan with dummy left outer join:

Documentation is thin

True.

The best official sources of information are Columnstore Indexes Described and SQL Server Columnstore Performance Tuning.

SQL Server MVP Niko Neugebauer has a terrific series on columnstore in general here.

There are some good technical details about the 2014 changes in the Microsoft Research paper, Enhancements to SQL Server Column Stores (pdf) though this is not official product documentation.

Sql-server – Code creating clustered columnstore index while maintaining row order

A clustered columnstore index is fundamentally different from a clustered rowstore index. You may have noticed there is no key column specification for a clustered columnstore index. That's right: a clustered columnstore index is an index with no keys - all columns are 'included'.

The most intuitive description I have heard for a clustered columnstore index is to think of it as a column-oriented heap table (where the 'RID' is rowgroup_id, row_number).

If you need indexes to support direct ordering and/or point/small range selections, you can create updateable rowstore b-tree indexes on top of clustered columnstore in SQL Server 2016.

In many cases this is simply not necessary, since columnstore access and batch mode sorting is so fast. Many of the things people 'know' about rowstore performance need to be relearned for columnstore. Scans and hashes are good :)

That said, of course columnstore has a structure to its row groups (and metadata about min/max values in each segment), which can be useful in queries that can benefit from row group/segment elimination.

One important technique in this area is to first create a clustered rowstore index with the desired ordering, then create the clustered columnstore index using the WITH (DROP_EXISTING = ON, MAXDOP = 1) option. In your example:

CREATE [UNIQUE] CLUSTERED INDEX idx 
ON dbo.tab1_cstore (id, time)
WITH (MAXDOP = 1);

CREATE CLUSTERED COLUMNSTORE INDEX idx 
ON dbo.tab1_cstore
WITH (DROP_EXISTING = ON, MAXDOP = 1);

Care is needed to maintain the benefits of row group/segment elimination over time. Also, while columnstore is already implicitly partitioned by row group, but you can explicitly partition it as well.

I'm not 100% sure what you're looking to test, but it is true that the 'order' of values within a segment is determined by the compression algorithm. My point about creating the columnstore index with DROP_EXISTING is about the ordering of data flowing into the segment creation process, so that segments overall will be ordered in a particular way. Within the segment, all bets are off.

Best Answer

Related Solutions

Sql-server – What exactly can SQL Server 2014 execute in batch mode

Sql-server – Code creating clustered columnstore index while maintaining row order

Related Question