Sql-server – Rebuilding large columnstore indexed tables – am I doing it right

columnstoresql serversql server 2014sql-server-2016upgrade

We are planning to move the database from SQL 2014 ENT into 2016 ENT.
We realized the data currently in columnstore table are not aligned in time when they are loaded, so we need to re-align the data.

Stats:

3 big tables (columnstore indexed)
the big tables has 60+ billion rows
4 data files (round robin)
partitioned by monthly
SQL Server 2014 ENT, 128 GB RAM
High Performance VM platform, 32 CPUs

File groups/datafiles:

PRIMARY
DATA (4 data files, each data file on 2TB disk)
LOG

Parition details:

Monthly partition function
All paritiona scheme on DATA FG

Table sizes (#rows):

T1 34,807,580,311
T2 16,458,306,369
T3 10,170,792,290

What I plan to do:

drop the columnstore index, create row-store index (this is to align the data in logtime)
then drop the row store index, convert the table into column store index (better compression and queries later)

I am concern this might took extremely long time to rebuild. Am I taking the right approach?

DDL:

CREATE TABLE [dbo].[T1](
[Id] [int] NOT NULL,
[C1] [smallint] NOT NULL,
[C2] [int] NOT NULL,
[C3] [int] NOT NULL,
[C4] [int] NOT NULL,
[C5] [datetime2](0) NOT NULL,
[C6] [real] NULL,
[C7] [real] NULL,
[C8] [real] NULL,
[C9] [real] NULL,
[C10] [real] NULL,
[C11] [datetime2](3) NULL,
[C12] [tinyint] NULL
) ON [DATA]

CREATE CLUSTERED COLUMNSTORE INDEX [T1_ColumnStoreIndex] ON [T1]

CREATE TABLE [dbo].[T2](
    [C1] [int] NOT NULL,
    [C2] [smallint] NOT NULL,
    [C3] [int] NOT NULL,
    [C4] [int] NOT NULL,
    [C5] [int] NOT NULL,
    [C6] [datetime2](0) NOT NULL,
    [C7] [real] NULL,
    [C8] [int] NULL,
    [C9] [int] NULL,
    [C10] [tinyint] NULL,
    [C11] [tinyint] NULL,
    [C12] [tinyint] NULL
) ON [DATA]

CREATE CLUSTERED COLUMNSTORE INDEX [T2_ColumnStoreIndex] ON [T2]

CREATE TABLE [dbo].[T3](
    [C1] [int] NOT NULL,
    [C2] [smallint] NOT NULL,
    [C3] [int] NOT NULL,
    [C4] [int] NOT NULL,
    [C5] [int] NOT NULL,
    [C6] [datetime2](0) NOT NULL,
    [C7] [real] NULL,
    [C8] [real] NULL,
    [C9] [real] NULL,
    [C10] [real] NULL,
    [C11] [real] NULL,
    [C12] [real] NULL,
    [C13] [real] NULL,
    [C14] [varchar](50) NULL,
    [C15] [datetime] NULL,
    [C16] [int] NULL,
    [C17] [int] NULL,
    [C18] [float] NULL,
    [C19] [tinyint] NULL,
    [C20] [tinyint] NULL,
    [C21] [datetime2](3) NULL,
    [C22] [tinyint] NULL
) ON [DATA]

CREATE CLUSTERED COLUMNSTORE INDEX [T3_ColumnStoreIndex] ON [T3]

This may not be Q&A but please I believe I can gain valuable inputs here.

Best Answer

What I plan to do: •drop the columnstore index, create row-store index (this is to align the data in logtime) •then drop the row store index, convert the table into column store index (better compression and queries later)

You can use CREATE CLUSTERED INDEX...WITH (DROP_EXISTING=ON) to change the existing columnstore clustered index to a b-tree and then CREATE CLUSTERED COLUMNSTORE INDEX...WITH (DROP_EXISTING=ON) to change back to columnstore. This will eliminate the drop index step that changes the columnstore to a heap.

I still expect this will take quite some time with 60 billions rows. Although source data might not have been loaded in C6 order, data overall are typically loaded in roughly chronological order so there will still be a temporal relationship for data in the same proximity. I suggest you review the min and max values in sys.column_store_segments to see if this effort is justified, considering that segment elimination is done after partition elimination. The performance benefit might not be a great as you think.

Best Answer

Related Solutions

Sql-server – Database design for handling 1 billion rows and counting

Sql-server – Clustered columnstore indexes and foreign keys

Related Question