Why does the columnstore index not fill the entire rowgroup

azure-sql-data-warehousecolumnstoreindex

I'm using [vColumnstoreDensity] to monitor the health of my columnstore indexes.

I've noticed that I've got one index that looks strange.

As far as my understanding goes, the columnstore index will fill a rowgroup before starting the next rowgroup. For best index performance, fuller row groups are better.

Having said all of this, I have one table that I'm having trouble understanding. Even after an index rebuild using the largest resource class (to give the maximum amount of memory possible to the build process) the [vColumnstoreDensity] view still shows this index as being spread across many partially full rowgroups.

COMPRESSED_rowgroup_count

4936

COMPRESSED_rowgroup_rows

2693512978

COMPRESSED_rowgroup_rows_MIN

468

COMPRESSED_rowgroup_rows_MAX

739443

COMPRESSED_rowgroup_rows_AVG

545687

Edit:

This is a clustered columnstore index.

Best Answer

There is an Extended Event that fires when a columnstore group is being 'cut', I think the event is column_store_index_build_process_segment. This event will have a 'trim' reason, and you should look for two possible trim causes:

  • memory (unlikely for your case), if the index build does not have enough memory to build a segment.
  • dictionary size (the likely cause), this happens if the dictionaries used to encode the data in a column segment reach the maximum size (16Mb).

Of course, to capture this event you need to set up an XE session during the index build (the linked article shows how).

You could also look at the post-build artifacts, specifically at the dictionaries, and see if the secondary dictionaries associated with the small segment are already at full size (16Mb). This would indicate the likely cause of trim to be full dictionary.