Columnstore Structure
Columnstore data is physically stored in one or more segments (regular LOB allocation units) per column, and may also be partitioned in the usual way. Each segment contains roughly one million rows of highly-compressed values or value references (several compression techniques are available). A value reference links to an entry in one of up to two hash dictionaries.
Dictionaries are pinned in memory during query execution, with data value IDs from the segment being looked in the dictionary whenever execution requires the actual data value (this lookup is deferred for as long as possible for performance reasons).
Segments also have a header record containing metadata such as the minimum and maximum values stored in the segment. Information from the header can often be used to eliminate complete partitions from processing at execution time. Header record information is stored in the usual LOB data root structure, so eliminating a segment means the Storage Engine can skip reading the LOB data pages from physical storage entirely. Maximizing the potential for elimination can require careful design, including a dependency on the clustered index order at the time the Columnstore index is built.
Specific Plan Operators
SQL Server 2012 introduces a new execution mode called Batch Mode. In this mode, packets of roughly 1000 rows are passed between operators, significantly improving processor utilization efficiency. Within each packet, columnar data is represented as a vector. Not all plan operators support batch mode operation, but examples of those that do include Columnstore Index Scan, Hash Inner Join, Batch Hash Table Build, Bitmap Filter, Hash Aggregate (not scalar aggregates), Filter, and Compute Scalar (for projection and expression evaluation). Query execution plans have been enhanced to show estimated and actual execution mode.
Anti-Patterns
There are a large number of restrictions in the first release, including constraints on allowable data types. Most common types are supported; unsupported data types include DECIMAL
with a precision greater than 18 digits, (N)VARCHAR(MAX)
, UNIQUEIDENTIFIER
, CLR types, and (VAR)BINARY
.
Use of string types, OUTER JOIN
, IN
, EXISTS
, NOT IN
, OR
, UNION ALL
may result in significantly reduced performance (Row Mode execution), unless workarounds are employed that typically involve unusual syntax rewrites as shown in the linked articles in this section.
More Information
Remus Rusanu has blogged a great overview here.
No, order does not matter. Each column is considered individually.
From the SQL Server team (emphasis added):
Typically you will want to put all the columns in your table into the columnstore index. It does not matter what order you list the columns because a columnstore index does not have a key like a B-tree index does. Internally, the data will be re-ordered automatically to get the best compression.
Best Answer
There is an Extended Event that fires when a columnstore group is being 'cut', I think the event is
column_store_index_build_process_segment
. This event will have a 'trim' reason, and you should look for two possible trim causes:Of course, to capture this event you need to set up an XE session during the index build (the linked article shows how).
You could also look at the post-build artifacts, specifically at the dictionaries, and see if the secondary dictionaries associated with the small segment are already at full size (16Mb). This would indicate the likely cause of trim to be full dictionary.