Postgresql – Continuously Adding SSDs to an existing PostgreSQL Server

postgresqlraidscalabilitytablespacestimescaledb

An Ubuntu 18.04 server dedicated to running a PostgreSQL 11.2 server (with TimescaleDB extension) will be running out of disk space soon so new SSD disks will need to be added to the machine to support the growing database size.

The data is expected to continue increasing at the same/higher rate, so the storage hardware will need to be increased continuously until the machine runs out of 2.5" drive bays. Only then will distributing the database over multiple machines be considered, due to the increased complexity involved.

Thoughts

  1. Union file systems like mergerfs can pool the drives together, solving the storage expansion problem easily. But this will increase the latency of database operations and are thus not recommended. Redundancy can be added by having an underlying RAID-1/5/6/10 or using SnapRAID.

  2. RAID-0 and RAID-10 allows expansion of the RAID array into the newly added drive, with the bonus of increased performance from striping. However every drive being added is an additional point of failure. Furthermore, numerous people claims that mirroring SSDs is of limited use as both SSDs in RAID-0 will likely fail at the same time. So maybe this means that RAID-10 is no better than RAID-0. Furthermore, failure rates increase linearly with each additional SSD added.

  3. RAID-5/6 reduce performance due to parity calculations and having write to 2 drives, reducing the effective IOPS by 75%. Seems to be a poor choice for databases.

  4. PostgreSQL TABLESPACES can be used to assign every table to a specific drive. However, using tablespaces will make recovery very complicated. Furthermore, is it possible to create new tablespaces on the new drives and let Postgres automatically decide where to write new record to?

  5. ZFS, BTRFS? Not familiar with them, willing to explore if they are suitable.

Question: What is the year-2020 recommended method of expanding the storage of a PostgreSQL machine, if expansion has to be done frequently (1-2X a year), performance should not be affected much, and recovery should not be so complicated that it may lead to data loss?

RAID-10 seems like a good idea to me, except that RAID-1 appears to have limited usage while causing the 'loss' of half the disk space, made worse by the increasing points of failure with increasing number of drives.

Due to budget constraints, we cannot fill up all 16 drive bays in a 2U chassis with SSDs at once, so it has to be done incrementally.

Any advice is greatly appreciated!

EDIT: After looking into ZFS, it seems like this might be one of the solutions for my case.

  • A ZFS pool with only mirrored ZFS vdevs (2 drives per vdev) will allow storage pool expansion by adding 2 drives at a time

  • Keeping 1-2 hot spares in the ZFS pool can allow automatic failover by ZFS when 1 of the drives fail

  • With mirrored vdevs, the rebuild time after a drive has failed will be way faster than with RAIDZ vdevs. This also reduces the chance that the surviving drive used for the rebuild can fail during the process. The degraded mirrored vdev will have much greater performance than a degraded RAIDZ vdev during rebuild

  • ZFS supports inline data compression which helps significantly (4X compression) with reducing the storage requirements of TimescaleDB data without having to use native TimescaleDB compression that prevents compressed data from being updated unless it is uncompressed. This is pretty important too as TimescaleDB is known to have poorer data compression compared to other databases like InfluexDB

  • Monitor and replace the SSDs when they are about to fail, as suggested by jjanes

If I understand this correctly, using ZFS mirrored vdevs will check all my boxes: Continuously add drives, allow drive pooling which provides a single mountpoint, data redundancy and a bonus ~4X data compression.

Best Answer

Have you turned on TimescaleDB's native compression, available since v1.5? I ask because you mention ZFS/BTRFS above, which suggests you aren't employing its compression already.

Typically see 90-98% storage savings in the wild...

https://docs.timescale.com/latest/using-timescaledb/compression

Additionally, you can use TimescaleDB's attach_tablespace command to add more disks, and then new chunks are load balanced across the available tablespaces.

https://docs.timescale.com/latest/api#attach_tablespace