How to Move Old Data to Cheap Hard Drive – PostgreSQL Archive with TimescaleDB

archivepostgresqltimescaledb

I have a quite big PostgreSQL database (with timescale plugin). Right now it is consuming about 500Gb on a SSD. Most of the data is in the form of time series. In most cases data older than a few months isn't really interesting.

My idea was to move that data to a cheap SATA hard drive instead of buying more expensive SSD's. Is that a good idea, and is there some good practice for implementing?

My naive implementation would be:
Keep two databases (or create a tablespace on the cheap HDD). Fetch every few hours data from "fast" (SSD) database to the "slow" database (HDD). Every few days, delete data from the slow database. Is this a good idea? I am happy to hear some feedback and better suggestions.

Best Answer

Here is a better architecture:

Create a new tablespace on the slow drive.
Set the storage parameters seq_page_cost and random_page_cost higher on that new tablespace so that the PostgreSQL optimizer knows that the disks are slower.
Partition the big time series tables by time ranges (use the same boundaries for all affected tables) so that you end up with a couple of dozen partitions for each.
Move the old partitions to the slow tablespace.

Then you still have all the data accessible.

Use PostgreSQL v11 or better for partitioning.

Related Solutions

Support ticket system – when/how to move old tickets

What is the problem with the table becoming large? Generally, any sort of OLTP query will access the table using an appropriate index in which case the size of the table is more or less irrelevant. The cost of using an index will grow at an O(log(n)) rate-- practically, a b*-tree index will only add one or two levels for any realistically sized table. And you can potentially limit that further by using function-based indexes to limit the size of the index by doing things like only indexing the active rows.

The only queries that should care about the size of the table are queries where you want to do a full scan on the table in order to do things like produce metrics about how many tickets have been opened since the beginning of time (or, at least, over a significant fraction of history). If you are concerned about those sorts of reporting queries, you can do things like use materialized views to pre-aggregate the data.

Normally, I would suggest keeping a single table and ensuring that appropriate indexes and/or materialized views exist to support the queries about whose performance you are concerned.

Mysql – Delete and archive old data from several related tables

Try pt-archiver from Percona Toolkit, it permits to transfer data on the fly between two MySQL instances.

You can for instance use the --where option to filter result set by date.

It's particularly well designed for your needs, because you can transfer data and delete it from source in the same command.

Official page is here : pt-archiver

An example :

pt-archiver --source h=<server_source>,D=<database_source>,t=<your_table> --dest h=<server_target> --where "date_field < DATE_SUB(NOW(), INTERVAL 3 MONTH)" --limit 1000 --txn-size 1000 --statistics

Add the option --dry-run if you want to test.

Add the option --no-delete if you want to keep data on source DB.

Best Answer

Related Solutions

Support ticket system – when/how to move old tickets

Mysql – Delete and archive old data from several related tables

Related Question