PostgreSQL 9.3 – Server Integrity After Losing a Tablespace

postgresqlpostgresql-9.3tablespaces

I'm running a PostgreSQL server with 2 tablespaces: 'ssd' and 'hdd'. The first contains databases that need to be queried quickly, and is located on a single SSD that is not redundant in any way (no RAID). I don't care if I lose these databases to a disk failure. By contrast, the 'hdd' tablespace is located on a RAID array and contains critical databases that need to survive a disk failure.

Question: if my SSD fails and PostgreSQL loses the 'ssd' tablespace completely (or it becomes corrupt/unstable), will databases in 'hdd' continue functioning normally? Or will the entire server experience issues?

Best Answer

I can't describe this better than the manual does

Warning

Even though located outside the main PostgreSQL data directory, tablespaces are an integral part of the database cluster and cannot be treated as an autonomous collection of data files. They are dependent on metadata contained in the main data directory, and therefore cannot be attached to a different database cluster or backed up individually. Similarly, if you lose a tablespace (file deletion, disk failure, etc), the database cluster might become unreadable or unable to start. Placing a tablespace on a temporary file system like a RAM disk risks the reliability of the entire cluster.

Or this blog post

You can’t just throw a tablespace away and have the rest of the database system keep on working.

Most importantly, the write-ahead log (WAL) that provides PostgreSQL’s crash safety is stored in pg_xlog, and is shared across all tablespaces. PostgreSQL expects to be able to replay this log in order and without errors after a crash or shutdown. Until the log has replayed, the database is assumed to be in an unsafe state and connections will be refused

Identifying relations (lost `pg_class`)

Pg's on-disk layout is minimalist, with almost all information about tables stored in the system catalogs, in tables like pg_class. That's necessary to properly support transactional DDL. The system catalogs are part of the main data directory.

There is a metapage for each table, but it won't tell you much.

The file layout of the tablespace is a tree of database-oids with relation-relfileid nodes within it. The mapping of relations relfileids to actual tables is stored ... in pg_class, in the main database.

So you have a collection of loose relation forks, with none of the metadata that tells you what they are. And it gets worse.

TOAST tables

As if that wasn't bad enough, lots of data is stored out-of-line in TOAST tables.

The only thing that connects TOAST tables to the main tables is the system catalogs.

Transaction visibility, lost `pg_clog` and `pg_controldata`

PostgreSQL records the status of transactions - which tx's are in progress, which ones committed, which rolled back, etc - in pg_clog. You guessed it, in the main database. The control file also resides in the main DB, and keeps track of things like the transaction ID wrap-around point.

Every tuple in a table has xmin and xmax fields, which together tell readers whether they should "see" a tuple when scanning the table. You've lost the records that give the transaction IDs in xmin and xmax meaning. So when you read your tables, you may get:

Duplicate rows where an UPDATE has marked the old row's xmax and created a new row.
Deleted rows
The results of rolled back transactions

Essentially, if you can read the table at all, it'll be a dirty read.

Unwritten changes and lost `pg_xlog`

PostgreSQL records changes to pg_xlog, then lazily applies them, possibly out of order, to the main relation forks. On crash it will replay these changes.

It doesn't matter if the on-disk status is inconsistent so long as the transaction logs are OK and the in-memory state in the buffer cache is fine.

You've lost the transaction logs, so your tables might be in a partially-replayed state with inconsistent/invalid data.

OK, so your data is a mess, what can you recover from it anyway?

Before you attempt to recover anything, make a complete copy of the original data. Do not change this. Any recovery attempts might make things worse, so you need to keep the originals.

Also, do not attempt any of this on a PostgreSQL install that contains anything you care about. initdb a throw-away PostgreSQL instance for the work. Also, you must be using exactly the same PostgreSQL version.

The first thing I'd do would be CREATE TABLE the original table structure. If you ever dropped columns or changed column types, you must repeat what you did exactly when you recreate the table. Once you're done, check the relfilenode that backs the table:

SELECT relname, relfilenode FROM pg_class WHERE relname = 'mytable';

You also need to find the associated TOAST table's relfileid.

Then I'd disconnect and stop PostgreSQL cleanly (pg_ctl stop -m smart) and try copying the lost table over to the file with the relfilenode of the newly recreated table. If the original had lots of forks (1234.1, 1234.2, etc) you need to copy and rename all of them. You might be able to match up which table is which based on file size, or by looking at strings for it.

Do the same for the TOAST table associated with the relation.

Then start PostgreSQL in standalone backend mode and attempt to read the table.

If you're really lucky it won't crash or ERROR, and will just give you an empty table or bogus results.

At this point, you then need to use pg_dirtyread to attempt to read the table data while ignoring transaction visibility. See the github repo.

It'd be quite nice to have a tool to dump a raw PostgreSQL database table standalone, but I don't think there's such a thing.

Others may have better suggestions about how to attempt recovery. It isn't something I've done, and I don't have time to test it out right now.

I suggest doing some more reading and research, see if anyone else has written about this already. Also post on pgsql-general@postgresql.org for further advice.

Postgresql – How to recover the database after losing a tablespace with indexes

How much improvement are you expecting from the SSD? It is surprising how little physical I/O actually happens on busy indexes and I don't think I've ever solved an actual performance problem with physical I/O devices.

I don't know enough about PostgresSQL to comment on buffering or recoverability and the like but I will say that choosing performance over recoverability is probably not a good practice.

I never want to find out that I have found a previously unknown recovery bug when my data is sitting in a tape on a shelf. I learned the DBA craft in the days before RAID - every night was a sleepless night. Always, always, always use RAID's.

If a user is complaining about performance, tell them you need a dozen SSD's to put in a 10 disk RAID5 array with hot failover. He'll leave with his head spinning but he might come back with a purchase order.

Good luck.

Best Answer

Related Solutions

Postgresql – How to access a old saved tablespace after reinstalling postgres

Identifying relations (lost pg_class)

TOAST tables

Transaction visibility, lost pg_clog and pg_controldata

Unwritten changes and lost pg_xlog

OK, so your data is a mess, what can you recover from it anyway?

Postgresql – How to recover the database after losing a tablespace with indexes

Related Question

Identifying relations (lost `pg_class`)

Transaction visibility, lost `pg_clog` and `pg_controldata`

Unwritten changes and lost `pg_xlog`