Sql-server – How to control Segmentation min/max data_id on a non-clustered ColumnStore index

columnstoresql serversql-server-2019

Given a simple row-based table without a PK but with a row-based clustered index like so:

create clustered index [CX_PropertyValue] ON [dbo].[PropertyValue] ([PropertyId], [Value])

Then I wish to add a column store index that is segmented in the same order as the clustered index above:

create nonclustered columnstore index CS_IX_PropertyValue on dbo.PropertyValue( 
    PropertyId, Value
)
with (drop_existing = on, maxdop = 1); -- maxdop=1 to preserve the order by property 

MaxDop hint to preserve order came from: here

Then the following query was used to report the min/max data_id for the PropertyId column and it the full range was reported on each of the 7 segments:

create view [Common].[ColumnStoreSegmentationView]
as
/*---------------------------------------------------------------------------------------------------------------------
    Purpose: List ColumnStore table segment min/max of columns.

     Source: https://joyfulcraftsmen.com/blog/cci-how-to-load-data-for-better-columnstore-segment-elimination/
             https://dba.stackexchange.com/a/268329/9415

    Modified    By            Description
    ----------  ----------    -----------------------------------------------------------------------------------------
    2020.06.02  crokusek/inet Initial Version 
  ---------------------------------------------------------------------------------------------------------------------*/
select --top 20000000000
       s.Name as SchemaName, 
       t.Name as TableName,
       i.Name as IndexName,
       c.name as ColumnName,
       c.column_id as ColumnId,
       cs.segment_id as SegmentId,
       cs.min_data_id as MinValue,
       cs.max_data_id as MaxValue
  from sys.schemas s
  join sys.tables t
    on t.schema_id = s.schema_id
  join sys.partitions as p  
    on p.object_id = t.object_id   
  join sys.indexes as I
    on i.object_id = p.object_id
   and i.index_id = p.index_id
  join sys.index_columns as ic
    on ic.[object_id] = I.[object_id]
   and ic.index_id = I.index_id   
  join sys.columns c
    on c.object_id = t.object_id
   and c.column_id = ic.column_id
  join sys.column_store_segments cs
    on cs.hobt_id = p.hobt_id
   and cs.column_id = ic.index_column_id 
 --order by s.Name, t.Name, i.Name, c.Name, cs.Segment_Id
GO

I tried making the clustered index unique which did slightly affect the reported ranges but still was not monotonically increasing.

Any ideas?

Here is a Link that accomplished the segmentation in this manner but I don't see any difference.

Version: Microsoft SQL Server 2019 (RTM) – 15.0.2000.5 (X64)

Best Answer

This isn't directly supported for nonclustered columnstore indexes.

It does work for clustered columnstore.

Azure Synapse Analytics has language support for doing it in one step e.g.:

CREATE CLUSTERED COLUMNSTORE INDEX <index_name>
ON dbo.PropertyValue
ORDER (PropertyId, Value);

This syntax has not yet made it to the SQL Server box product, though it is available under an undocumented feature flag so perhaps it isn't far away. It still won't work on a nonclustered columnstore index though.

General Workaround

The best you can do is to create the nonclustered rowstore index with MAXDOP = 1, then replace it with a nonclustered columnstore index with MAXDOP = 1 and DROP_EXISTING = ON.

This isn't guaranteed to preserve the ordering as you want, but it is highly likely:

CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);

This will give you your best chance of achieving rowgroup elimination when filtering on PropertyId.

Special Case

When the desired ordering matches the rowstore clustered index (as appears to be the case in the question), there is no need to create a rowstore nonclustered index first. The documentation says:

Note, for nonclustered columnstore index (NCCI), if the base rowstore table has a clustered index, the rows are already ordered. In this case, the resultant nonclustered columnstore index will automatically be ordered.

So, in your case, it should be enough to run only:

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

See this db<>fiddle demo.

fiddle results

Metadata

You can see the min and max values for each rowgroup and column using:

SELECT
    CSS.column_id,
    column_name = C.[name],
    rowgroup_id = CSS.segment_id,
    CSS.min_data_id,
    CSS.max_data_id,
    CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
    ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
    ON I.[object_id] = P.[object_id]
    AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
    ON IC.[object_id] = I.[object_id]
    AND IC.index_id = I.index_id
    AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
    ON C.[object_id] = P.[object_id]
    AND C.column_id = IC.column_id
WHERE
    P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
    C.column_id,
    CSS.segment_id;