Columnstore Structure
Columnstore data is physically stored in one or more segments (regular LOB allocation units) per column, and may also be partitioned in the usual way. Each segment contains roughly one million rows of highly-compressed values or value references (several compression techniques are available). A value reference links to an entry in one of up to two hash dictionaries.
Dictionaries are pinned in memory during query execution, with data value IDs from the segment being looked in the dictionary whenever execution requires the actual data value (this lookup is deferred for as long as possible for performance reasons).
Segments also have a header record containing metadata such as the minimum and maximum values stored in the segment. Information from the header can often be used to eliminate complete partitions from processing at execution time. Header record information is stored in the usual LOB data root structure, so eliminating a segment means the Storage Engine can skip reading the LOB data pages from physical storage entirely. Maximizing the potential for elimination can require careful design, including a dependency on the clustered index order at the time the Columnstore index is built.
Specific Plan Operators
SQL Server 2012 introduces a new execution mode called Batch Mode. In this mode, packets of roughly 1000 rows are passed between operators, significantly improving processor utilization efficiency. Within each packet, columnar data is represented as a vector. Not all plan operators support batch mode operation, but examples of those that do include Columnstore Index Scan, Hash Inner Join, Batch Hash Table Build, Bitmap Filter, Hash Aggregate (not scalar aggregates), Filter, and Compute Scalar (for projection and expression evaluation). Query execution plans have been enhanced to show estimated and actual execution mode.
Anti-Patterns
There are a large number of restrictions in the first release, including constraints on allowable data types. Most common types are supported; unsupported data types include DECIMAL
with a precision greater than 18 digits, (N)VARCHAR(MAX)
, UNIQUEIDENTIFIER
, CLR types, and (VAR)BINARY
.
Use of string types, OUTER JOIN
, IN
, EXISTS
, NOT IN
, OR
, UNION ALL
may result in significantly reduced performance (Row Mode execution), unless workarounds are employed that typically involve unusual syntax rewrites as shown in the linked articles in this section.
More Information
Remus Rusanu has blogged a great overview here.
The reason for this is that the "fixed" physical location of your row - the RID (or row identifier) might (and will!) change over time - think page splits that occur when a row needs to be inserted into a table on a page that's already full.
Updating those RIDs in all the nonclustered indices that exist on a given table is quickly becoming both a hassle, and a huge performance killer. You might have 5, 10, 20 nonclustered indices on your table, and SQL Server would have to scan all those indices (basically scanning the whole index, all rows in the index, and that 10, 20 times) and update all RIDs.... that's just not practical - very quickly so.
If you store the value of the clustering index as a "row pointer" instead, that value should typically never change - and most definitely it doesn't need to be updated every time a page is split. Yes, it does involve a second index seek operation - the key lookup - but for simple scenarios, retrieving a single row or a few rows, that's still much more efficient than anything else.
Best Answer
The "bookmark" is the columnstore index original locator (per "Pro SQL Server Internals" by Dmitri Korotkevitch). This is an 8-byte value, with the columnstore index's
row_group_id
in the first 4-bytes and an offset in the second 4-bytes.If you use
DBCC PAGE
to look at the non-clustered index, the 8-byte columnstore index original locator appears in the "uniquifier" column of theDBCC PAGE
output. This shows that a unique non-clustered index does not need to include the columnstore row locator, whereas a non-unique non-clustered index does.The following code creates a columnstore-organized table with a unique and non-unique b-tree nonclustered index on the same column:
We can see the size of the index row at different levels of the b-tree using
sys.dm_db_index_physical_stats
:The output is:
Both structures have the same row size at the leaf level, but the nonunique nonclustered index is 12 bytes larger than the unique nonclustered index at the non-leaf levels due to the 8-byte columnstore locator, plus 4 bytes of overhead for the first variable-length column in a row (uniquifier is variable length).