SQL Server – Testing Table for Compression Candidate

compressionsql server

Does anyone know a tool like this page https://columnscore.com/ where one can determine if a table is a good candidate for row or page compression.

Also I am trying to understand the benefits of compression. I have been reading about it, but still have these questions.

  1. Will compression make more 8KB fit in the buffer cache? When does data become uncompressed i.e. in a select?
  2. will backups become faster?
  3. Is the reason that compression ads cpu overhead because e.g. inserts need to be compressed before the insert?
  4. Is there any good reason to compress tables without investigating how the tables is being used (i.e. static vs. volatile).

A colleague has started to compress many tables (page). I don't know the analysis behind this decision. I know we sometimes have resource semaphore waits during some peak hours. I actually don't think he knows what he is doing. I can see that our sos scheduler yield waits have gone slightly up.

I think the compression is causing more harm and it seems that maintenance tasks overshoot more now (e.g. backups, checkdb…).

Best Answer

Does anyone know a tool like this page https://columnscore.com/ where one can determine if a table is a good candidate for row or page compression.

Tool recommendations are off-topic here, primarily because there can be any number of answers, and they tend to go out-of-date quickly. That said, SQL Server does provide a built-in procedure to estimate space savings from compression, sp_estimate_data_compression_savings. It does not attempt to predict whether compression will be a good option for the workload as a whole, just whether space savings can be expected or not. It is essentially a wrapper around TABLESAMPLE, creating a new object and extrapolating full size from there.

Will compression make more 8KB fit in the buffer cache?

If compression is effective, then yes. Pages remain compressed in the buffer pool.

When does data become uncompressed i.e. in a select?

Data is decompressed when a SQL Server component above the level of the storage engine needs to work with the data. For example, an Index Scan will present decompressed data to a Filter operator in an execution plan.

SQL Server major components

The stack trace below shows one example of decompression before presenting data to the query processor (storage engine code highlighted):

Stack trace

You can read more about this in Compression Strategies by Sunil Agarwal (part of the SQL Server engineering team at the time).

will backups become faster?

Many people have reported that they do. The intuition is that compressed data on persistent storage requires fewer I/Os to read for backup. If your instance is typically CPU-bound, you may see slower backups as the extra processor work needed for decompression outweighs the I/O speedup.

Is the reason that compression ads cpu overhead because e.g. inserts need to be compressed before the insert?

Certainly compression adds CPU overhead, but this is typically manageable because the data compression algorithms were chosen for efficiency rather than ultimate size. Note that data compression and backup compression are very different. Backup compression is based on a variant of gzip compression, and is intended to achieve very good size reduction at the expense of more CPU usage.

Every insert and update will incur at least some data compression overhead. This is more noticeable with PAGE compression. SQL Server may also attempt to recompress an entire page after a certain number of modifications have been made to that page.

Is there any good reason to compress tables without investigating how the tables is being used (i.e. static vs. volatile).

This is subjective, and depends on your local priorities. In general, one might choose to compress without detailed testing where it is clear substantial (and necessary) space savings will be made, and the system has plenty of spare processor time available.

A colleague has started to compress many tables (page). I don't know the analysis behind this decision.

You should discuss the rationale with your colleague.