Sql-server – Is SQL Server data compression categorically good for read-only databases

compressiondata-warehousesql serversql-server-2012

Some literature on SQL Server data compression I read state that the write cost increases to about four times what would normally be required. It also seems to imply that this is the primary downside to data compression, strongly implying that for a read-only archive database, the performance will (with few excep
tions) be improved by the usage of data compression of 100% filled pages.

  1. Are the statements above true?
  2. What are the primary "variations" between data compression and otherwise (for reading)

    • "CPU +x%"?
    • "IO -y%"?
    • page split occurence?
    • tempdb usage?
    • RAM usage?
  3. And for writing?

For the purpose of this question, you can limit the context to PAGE-level compression of a big (> 1TB) database, but additional comments are always welcome.


References:

SQL Server Storage Engine Blog (The DW scenario shows compression to be very advantageous)
Data Compression: Strategy, Capacity Planning and Best Practices

A more detailed approach to deciding what to compress involves
analyzing the workload characteristics for each table and index. It is
based on the following two metrics:

U: The percentage of update operations on a specific table, index, or partition, relative to total operations on that object. The lower
the value of U (that is, the table, index, or partition is
infrequently updated), the better candidate it is for page
compression.
S: The percentage of scan operations on a table, index, or partition, relative to total operations on that object. The higher the
value of S (that is, the table, index, or partition is mostly
scanned), the better candidate it is for page compression.

Both of the above are demonstrably biased towards recommending page compression for DW-style databases (read-intensive/exclusive, big-data operations).

Best Answer

Just my 2cents from my own experiments on 1-2 year old hardware:

Read-only operations (DW-style scans, sorts etc) on page-compressed tables (~80rows/page) I've found to break-even at compression size reduction of ~ 3x.

I.e. if the tables fit into memory anyway, page compression only benefits performance if the data size has shrunk by over 3x. You scan fewer pages in memory, but it takes longer to scan each page.

I guess your mileage may vary if your plans are nested-loop and seek-heavy. Among others, this would also be hardware-dependent (foreign NUMA node access penalties, memory speed etc).

The above is just a rough rule-of-thumb that I follow, based on my own test runs using my own queries on my own hardware (Dell Poweredge 910 and younger). It is not gospel eh!

Edit: Yesterday the excellent SQLBits XI presentation of Thomas Kejser was made available as a video. Quite relevant to this discussion, it shows the 'ugly' face of CPU cost for page compression - updates slowed down by 4x, locks held for quite a bit longer.

However, Thomas is using FusionIO storage and he picked a table that is only 'just' eligible for page compression. If storage was on a typical SAN and the data used compressed 3x-4x then the picture might have been less dramatic.