Mysql – Trying to fit indexes in memory: partitioning vs compression or neither

MySQL

I have a "transaction" table that grows at about 150k rows-per-day rate. 97% of user aggregate queries refer to the last 3 months of data and must run very efficiently. Trimming older transaction data is not an option as I need to make it available going back 12 years. /Splitting "recent" from "old" transactions into separate tables is an option, but I'm trying to avoid this complication if possible./

Would you recommend table partitioning to help keep indexes in memory? Currently, the table has bigint primary key, so if I go with partitioning by date-range, I'd have to add datetime to the PK, which would double PK's memory footprint — feels kind of counter productive.

Or, perhaps table compression would help better? Specifically, MySQL 5.7 permits page-level compression for InnoDB tables that reside in file-per-table tablespaces (mine do).

Increasing system memory is not an option for a while as I'm pinned to an AWS reserved instance for time being.

Best Answer

Indexes and data are stored in pages. Buffer pool holds the "hottest" pages meaning that if you only access one quarter of your table frequently mostly the "hot" part of your table is going to be the buffer pool because an LRU list manages which page(s) should be evicted at any given time.

Partitioning is a good alternative which also helps data cleanup (simply dropping a partition) but yes as you said it comes with some restriction.

If you have auto increment primary key you can also set up ranges on that. Since you have quite good estimation on how many rows you have inserted / day it should be easy to calculate ranges which translates nicely into date ranges.

Experiment with the setup and see if it does improve the performance.

InnoDB compresses on page level but there are many gotchas there:

1) you should check first if your dataset is compressible enough to have benefit. InnoDB user KEY_BLOCK_SIZE which means if it can compress below this than it will otherwise won't.

2) To minimize disk IO both compressed and uncompressed version of the page is stored in the buffer pool meaning that with assuming at least 50% compressability you sacrifice 33% of your buffer pool on the altar of compression which may become an issue. This can be improved with better compression rates. For example with key_block_size=4 only 20% but that means every page has to be minimum 1:4 compressable.

From http://dev.mysql.com/doc/refman/5.5/en/innodb-compression-internals.html:

Thus, at any given time, the buffer pool might contain both the compressed and uncompressed forms of the page, or only the compressed form of the page, or neither.

A good explanation of the scenario is available in this answer: https://serverfault.com/questions/358444/setting-mysql-innodb-compression-key-block-size

3) Table compression can also lead to serious mutex contention issues. For more details: https://www.percona.com/blog/2011/05/20/innodb-compression-woes/

Therefore I wouldn't go for compression unless you have big varchar columns that you want to have indexed. You can compress either in application level or on the filesystem both has their own benefits and both are much more efficient than InnoDB compression.