I'm aware of the limitation "every unique key on the table must use every column in the table's partitioning expression". So given that my primary key is a UUID, it's going to require a change.
Background: I have a transaction history table that grows at about 1.5G per month (more so than I can keep upgrading server memory). The table has a PRIMARY key (uuid_short) and a datetime column, among others. There are two types of queries hitting the table: those that return a list of transactions in a date-range, and those that refer to a specific transaction through its primary key.
Question: Since vast majority of queries don't care for transactions more than 18 months old I thought datetime could be a candidate for range partitioning. Does this mean I'm better off dropping the PRIMARY key and converting it to a non-unique key? (dups are not a problem, the app servers guarantee there cannot be any).
The system is currently a hybrid of OLTP and OLAP scenarios which is probably why it doesn't scale.
Is partitioning going to make a positive impact on memory footprint?
If not, then I may be better off reorganizing the system by moving historical transactions to a data warehouse like AWS Redshift. It would be quite a refactoring effort, but if that's the preferred solution then I'd like to plan early.
This is MySQL 5.6.
Best Answer
Assuming
Anything else?
UUIDs
are terrible for performance when the dataset size becomes too big to cache.It would take some work, but this is what I recommend:
Characteristics:
id
onto the PK assures that the it is unique,id
replaces your uuid_short in the application (so, code change),INSERTs
are very efficient -- both the data (& PK) and the secondary key (id) are being added to at the end.Now, I will add on the purging.
PARTITION BY RANGE(TO_DAYS(dt))
is the optimal way to do it. Define the ranges as months. Each month:DROP
(instantly) the oldest month andADD
a new month. See my blog for more details.How to transition from uuid_short to id?
INDEX(id)
to the existing table in a singleALTER
,id
,ALTER
),PARTITIONing
.Steps 1,3,4 will each require copying over the entire table; the sooner you do it, the less downtime you will suffer. Or consider
pt-online-schema-change
orgh-ost
for doing them "online".Another tip: See if you can shrink any of the datatypes of other columns.