Postgresql – Distributing a single table across multiple disks of different performance

performanceperformance-tuningpostgresql

I have a table that increases by several million rows per day. It holds over a year's worth of data.

The table is regularly SELECTed from, but the older the data is the less frequently it is accessed. The current data (i.e. data added in the last hour) is very frequently queried, but as the records get older they are less likely to be queried.

The database is currently on a traditional (magnetic) hard disk. I also have a much smaller solid state disk that is very fast, but not large enough to hold more than few days' worth of data.

I’d like to somehow spread the table across the two disks. I’d like new data to be INSERTed into the small, high performance disk, and then invisibly migrated to the slower but larger disk after a few days (when it’s less likely to be needed).

I know I could split the data into two tables (let's call then TABLE_SLOW and TABLE_FAST). I could insert into TABLE_FAST and then run an archiving process to transfer the oldest records into TABLE_SLOW every night, and then create a view based on a union of TABLE_SLOW and TABLE_FAST to use for queries – but I’m looking for a solution that wouldn’t require any changes to the application, only the database.

The data includes a field called ‘loadedTime’ so there’s no problem working out how old the data is.

Best Answer

How to put data on a different disk: (using tablespaces)

CREATE TABLESPACE fastspace LOCATION '/srv/fast_disk/postgresql/data';

https://www.postgresql.org/docs/10/static/manage-ag-tablespaces.html

.

How to make two tables behave like one: (SQL inheritance)

CREATE TABLE TABLE_FAST () inherits TABLE_SLOW TABLESPACE fastspace;

You will also need to add constraints and indices as apropriate to this new table.

https://www.postgresql.org/docs/10/static/tutorial-inheritance.html

Now selects from TABLE_SLOW will also see the data in TABLE_FAST

you may also benefit from implementing constraint exclusion (also described on that page), but the index on Loaded_Time (which I am assuming you have) should give a good speed increase itself

.

How to move data data: (CTE delete returning)

eg: 3 days old

with deleted as ( 
     delete from only TABLE_FAST 
     where Loaded_Time < 'now()' - '3 days'::interval
     returning * )
insert into slow_table select * from deleted;