MySQL – How to Efficiently Swap a Portion of a Large Table’s Data

MySQLpartitioning

I run an application (price comparison website) where I need to update a pricing table with dozens of millions of rows every day.

Basically, every product feed I download from a merchant results in several million rows to delete from the pricing table (WHERE merchantId = x), and several million fresh rows to insert in place of the old ones.

So far, I used to download all the feeds sequentially, insert all of them in a temporary table, then swap the old table with the new one in a single RENAME TABLE query.

This forced me to update all the merchants at once, hence I'd like to move away from this method, and have the ability to process once merchant at a time, deleting all records WHERE merchantId = x and insert all new records in a sub-second query.

My first thought was to use partitioning by list, one partition per merchant. This would solve half of my problem: I can use TRUNCATE PARTITION to quickly remove all existing prices for a given merchant, but the other half of the problem remains: how to quickly insert the millions of new rows, without downtime and unnecessary locks, and without running with the table partially loaded for several minutes?

So I was thinking: is it possible to create a temporary table with the same structure as the partitioned table, fill it with the new rows, and once it is complete, swap one partition of the pricing table with this temporary table?

Any suggestion to solve this problem would be appreciated, including suggestions that do not involve partitioning. I have thought about the MRG_MYISAM engine as well, but I'm not sure whether it fulfils my needs. On a final note, any storage engine is fine for this pricing table, including non-transactional engines.

Best Answer

After a little bit more digging on the MySQL website, I found out about Exchanging Partitions and Subpartitions with Tables, which seems to be a new MySQL 5.6 feature:

ALTER TABLE x EXCHANGE PARTITION y WITH TABLE z;

It seems to be exactly what I was looking for.

After running some benchmarks on my server, I found out that exchanging a partition with a table is not instant, but still very fast: about 1s per 1,700,000 records in the table being integrated in the partitioned table (the number of records in the partition being taken out does not seem to have any influence on the speed), which is very close to the sub-second swapping I was hoping to get for a table with a few million records.

This is to be compared against inserting the rows directly in the partitioned table, where I get about 27,000 inserts / second.

Hence swapping an already filled up table with a partition is roughly 60x faster (for me, YMMV) than inserting directly into the partitioned table.

Sweet!