Mysql – When range partitioning, should I drop primary UUID-like key

MySQLmysql-5.6partitioning

I'm aware of the limitation "every unique key on the table must use every column in the table's partitioning expression". So given that my primary key is a UUID, it's going to require a change.

Background: I have a transaction history table that grows at about 1.5G per month (more so than I can keep upgrading server memory). The table has a PRIMARY key (uuid_short) and a datetime column, among others. There are two types of queries hitting the table: those that return a list of transactions in a date-range, and those that refer to a specific transaction through its primary key.

Question: Since vast majority of queries don't care for transactions more than 18 months old I thought datetime could be a candidate for range partitioning. Does this mean I'm better off dropping the PRIMARY key and converting it to a non-unique key? (dups are not a problem, the app servers guarantee there cannot be any).

The system is currently a hybrid of OLTP and OLAP scenarios which is probably why it doesn't scale.

Is partitioning going to make a positive impact on memory footprint?

If not, then I may be better off reorganizing the system by moving historical transactions to a data warehouse like AWS Redshift. It would be quite a refactoring effort, but if that's the preferred solution then I'd like to plan early.

This is MySQL 5.6.

Best Answer

Assuming

  • Some queries involve a date range, usually "recent" dates,
  • Other queries are point-queries, which involves a unique key
  • You can purge data older than 18 months.

Anything else?

UUIDs are terrible for performance when the dataset size becomes too big to cache.

It would take some work, but this is what I recommend:

id INT UNSIGNED NOT NULL AUTO_INCREMENT -- or `BIGINT` if might go past 4 billion
dt DATETIME NOT NULL  -- or TIMESTAMP if you prefer
PRIMARY KEY(dt, id)
INDEX(id)

Characteristics:

  • The data is in datetime order, thereby making those queries very efficient,
  • Tacking id onto the PK assures that the it is unique,
  • id replaces your uuid_short in the application (so, code change),
  • There is a PK, which InnoDB really likes to have,
  • You won't need to keep adding RAM -- most of the activity is near the "end" of the table,
  • INSERTs are very efficient -- both the data (& PK) and the secondary key (id) are being added to at the end.

Now, I will add on the purging.

PARTITION BY RANGE(TO_DAYS(dt)) is the optimal way to do it. Define the ranges as months. Each month: DROP (instantly) the oldest month and ADD a new month. See my blog for more details.

How to transition from uuid_short to id?

  1. Add the AUTO_INCREMENT id and INDEX(id) to the existing table in a single ALTER,
  2. Convert the app's use of uuid_short to id,
  3. Drop the uuid_short column and change the indexes (all in a single ALTER),
  4. Add PARTITIONing.

Steps 1,3,4 will each require copying over the entire table; the sooner you do it, the less downtime you will suffer. Or consider pt-online-schema-change or gh-ost for doing them "online".

Another tip: See if you can shrink any of the datatypes of other columns.