One of our tables is over 200gb and the old records are hardly ever queried, and this makes me think partitioning is a good solution, however I am having trouble figuring out which column to use as a Partition Key for a table.
We have 2 tables as follows (names and structures are altered for confidentiality):
transactions
table – with a transaction ID,checks
table – With the schema(id, transcation_id, created_at, metadata, last_updated_at)
there can be multiple different checks with for a giventransaction_id
As part of data aggregation we write a summarized version of the checks into a single column in the transactions
table. The way we do this aggregation is by running an UPDATE
on transactions
table with joins on the checks table filtering by last_updated_at
to only check for checks that have been updated recently.
Users however query using created_at
in the checks
table, and also last_updated_at
could potentially change for a given row (although not likely).
Which one should I use as partition key? created_at
or last_updated_at
?
Best Answer
Most queries become slower with partitioning.
The exceptions I know are:
WHERE
clauseSo you should rather consider how data expire. Partition your tables so that you can get rid of old, obsolete data by dropping partitions.