SQL Server – Using sp_estimate_data_compression_savings for Columnstore Compressions

columnstorecompressionsql serversql server 2014

Is there a way to get compressions saving for column store and column store column store archival compression?

In order to check these compressions saving I have to drop all clustered and non clustered indexes and then to create a clustered column stored index – most of the time this a time consuming operation and I am wondering is there an other way to do this.

Best Answer

There is no equivalent proc so what I have been doing recently is creating a copy of the table in tempdb using SELECT...INTO, eg

SELECT TOP 10 PERCENT *
INTO tempdb.dbo.yourTable
FROM dbo.yourTable

then apply the clustered columnstore. If your table is too large to fit in your tempdb (as I might expect for a datawarehouse table), transfer a percentage and infer from that. You could also create the empty table in tempdb, add the columnstore then populate, although this will probably take longer - compression has quite an effect on INSERT time. The SELECT...INTO technique has the added benefit of removing the indexes for you, can be minimally logged and can even run in parallel on SQL 2014.

Example: I have a 1TB table, I copy 100GB ( 10% ) into a table in tempdb. I apply the clustered columnstore index ( normal compression ) and check the table size. It has packed down to 10GB, or 10x compression. I rebuild the clustered columnstore with COLUMNSTORE_ARCHIVE mode, the table packs down to 5GB or 20x compression. I can infer that my 1TB table will pack down to approximately 100GB or 50GB with archival mode. Allow some tolerance as it's an estimate.

The documentation suggests up to 7x compression but I have been seeing larger factors in my testing, particularly with the COLUMNSTORE_ARCHIVE mode, clearly with some CPU overhead when reading back. YMMV.