Sql-server – Rebuilding large columnstore indexed tables – am I doing it right

columnstoresql serversql server 2014sql-server-2016upgrade

We are planning to move the database from SQL 2014 ENT into 2016 ENT.
We realized the data currently in columnstore table are not aligned in time when they are loaded, so we need to re-align the data.

Stats:

  • 3 big tables (columnstore indexed)
  • the big tables has 60+ billion rows
  • 4 data files (round robin)
  • partitioned by monthly
  • SQL Server 2014 ENT, 128 GB RAM
  • High Performance VM platform, 32 CPUs

File groups/datafiles:

  • PRIMARY
  • DATA (4 data files, each data file on 2TB disk)
  • LOG

Parition details:

  • Monthly partition function
  • All paritiona scheme on DATA FG

Table sizes (#rows):

  • T1 34,807,580,311
  • T2 16,458,306,369
  • T3 10,170,792,290

What I plan to do:

  • drop the columnstore index, create row-store index (this is to align the data in logtime)
  • then drop the row store index, convert the table into column store index (better compression and queries later)

I am concern this might took extremely long time to rebuild. Am I taking the right approach?

DDL:

CREATE TABLE [dbo].[T1](
[Id] [int] NOT NULL,
[C1] [smallint] NOT NULL,
[C2] [int] NOT NULL,
[C3] [int] NOT NULL,
[C4] [int] NOT NULL,
[C5] [datetime2](0) NOT NULL,
[C6] [real] NULL,
[C7] [real] NULL,
[C8] [real] NULL,
[C9] [real] NULL,
[C10] [real] NULL,
[C11] [datetime2](3) NULL,
[C12] [tinyint] NULL
) ON [DATA]

CREATE CLUSTERED COLUMNSTORE INDEX [T1_ColumnStoreIndex] ON [T1]

CREATE TABLE [dbo].[T2](
    [C1] [int] NOT NULL,
    [C2] [smallint] NOT NULL,
    [C3] [int] NOT NULL,
    [C4] [int] NOT NULL,
    [C5] [int] NOT NULL,
    [C6] [datetime2](0) NOT NULL,
    [C7] [real] NULL,
    [C8] [int] NULL,
    [C9] [int] NULL,
    [C10] [tinyint] NULL,
    [C11] [tinyint] NULL,
    [C12] [tinyint] NULL
) ON [DATA]

CREATE CLUSTERED COLUMNSTORE INDEX [T2_ColumnStoreIndex] ON [T2]

CREATE TABLE [dbo].[T3](
    [C1] [int] NOT NULL,
    [C2] [smallint] NOT NULL,
    [C3] [int] NOT NULL,
    [C4] [int] NOT NULL,
    [C5] [int] NOT NULL,
    [C6] [datetime2](0) NOT NULL,
    [C7] [real] NULL,
    [C8] [real] NULL,
    [C9] [real] NULL,
    [C10] [real] NULL,
    [C11] [real] NULL,
    [C12] [real] NULL,
    [C13] [real] NULL,
    [C14] [varchar](50) NULL,
    [C15] [datetime] NULL,
    [C16] [int] NULL,
    [C17] [int] NULL,
    [C18] [float] NULL,
    [C19] [tinyint] NULL,
    [C20] [tinyint] NULL,
    [C21] [datetime2](3) NULL,
    [C22] [tinyint] NULL
) ON [DATA]

CREATE CLUSTERED COLUMNSTORE INDEX [T3_ColumnStoreIndex] ON [T3]

This may not be Q&A but please I believe I can gain valuable inputs here.

Best Answer

What I plan to do: •drop the columnstore index, create row-store index (this is to align the data in logtime) •then drop the row store index, convert the table into column store index (better compression and queries later)

You can use CREATE CLUSTERED INDEX...WITH (DROP_EXISTING=ON) to change the existing columnstore clustered index to a b-tree and then CREATE CLUSTERED COLUMNSTORE INDEX...WITH (DROP_EXISTING=ON) to change back to columnstore. This will eliminate the drop index step that changes the columnstore to a heap.

I still expect this will take quite some time with 60 billions rows. Although source data might not have been loaded in C6 order, data overall are typically loaded in roughly chronological order so there will still be a temporal relationship for data in the same proximity. I suggest you review the min and max values in sys.column_store_segments to see if this effort is justified, considering that segment elimination is done after partition elimination. The performance benefit might not be a great as you think.