Sql-server – Identity column in columnstore index

columnstoresql serversql-server-2012

I have an extremely large table IMO (~137 million rows) with lots of repeated data, lots of NULL columns, and such.

I am considering exploring this using a table with a COLUMNSTORE INDEX and I have an IDENTITY column in the original table, which is my only column where every row is unique.

Should I leave this column out or include it? I have read that you want to include all rows of your table into the COLUMNSTORE INDEX but I have also read that the best candidates are columns with lots of non-unique rows.

Is this just a bad candidate for a COLUMNSTORE INDEX?

I am using SQL Server 2012, so it is a nonclustered columnstore. I am just exploring possible better ways to store this data. Updates are nonexistent, although new rows will periodically be added through an ELT process so I am assuming some work would be done there. Some folks mine this data and generate huge reports, lots of scanning of rows, brings the server to a crawl at times which has forced us to offload a copy daily to a secondary server.

Best Answer

Identity columns are not truly compressed in Columnstore Indexes in SQL Server 2012 or in SQL Server 2014. It will all truly depend on the workload that you are experiencing. If your workload will include the identity column, then you can very beautifully take advantage of Segment elimination.

From the compression point of view - Columnstore will provide you with better compression than page typically does. Typically. Please test it before advancing to production.

Your biggest problem in SQL Server 2012 will be a very weak implementation of the Batch Mode, and there is nothing you can do about it.