Historically, PostgreSQL stored tables and indexes in individual files. Tablespaces are a means of placing multiple tables/indexes into a single file or group of related files in the same directory. Other database management systems use similar techniques, although tney can store multiple objects within a file. On PostgreSQL a tablespace is implemented as a directory that individual files are placed in, but the files are still one-per-database-object.
A table must reside in a single tablespace unless it is partitioned, in which case each partition must reside on a given tablespace. On some DBMS platforms you could add files to a tablespace as you suggest, although in the case case of PostgreSQL a tablespace is essentially just a directory for grouping files together. On systems where this is possible it could result in suboptimal I/O performance as individual disks could contain I/O hot spots. Probably, you would be better off setting multiple disks up in a RAID volume and placing the tablespace on that.
Depending on your workload you may wish to set up individual disk volumes with tablespaces on each volume and distribute parts of your database over the different volumes. The most trivial case is separating database logs onto a separate volume, but you could distribute data over multiple volumes for various reasons.
EDIT: Here is a link on PostgreSQL tablespaces.
Recovering a tablespace if you've deleted the main data directory is difficult, because all the metadata about what's in the tablespace is gone. (Nor can you generally recover the main DB if you lose a tablespace.).
I really, really hope you had decent backups, because recovering your data will be difficult and time consuming. You'll need to learn a lot about PostgreSQL's innards.
Identifying relations (lost pg_class
)
Pg's on-disk layout is minimalist, with almost all information about tables stored in the system catalogs, in tables like pg_class
. That's necessary to properly support transactional DDL. The system catalogs are part of the main data directory.
There is a metapage for each table, but it won't tell you much.
The file layout of the tablespace is a tree of database-oids with relation-relfileid nodes within it. The mapping of relations relfileids to actual tables is stored ... in pg_class
, in the main database.
So you have a collection of loose relation forks, with none of the metadata that tells you what they are. And it gets worse.
TOAST tables
As if that wasn't bad enough, lots of data is stored out-of-line in TOAST
tables.
The only thing that connects TOAST
tables to the main tables is the system catalogs.
Transaction visibility, lost pg_clog
and pg_controldata
PostgreSQL records the status of transactions - which tx's are in progress, which ones committed, which rolled back, etc - in pg_clog
. You guessed it, in the main database. The control file also resides in the main DB, and keeps track of things like the transaction ID wrap-around point.
Every tuple in a table has xmin
and xmax
fields, which together tell readers whether they should "see" a tuple when scanning the table. You've lost the records that give the transaction IDs in xmin
and xmax
meaning. So when you read your tables, you may get:
- Duplicate rows where an
UPDATE
has marked the old row's xmax
and created a new row.
- Deleted rows
- The results of rolled back transactions
Essentially, if you can read the table at all, it'll be a dirty read.
Unwritten changes and lost pg_xlog
PostgreSQL records changes to pg_xlog
, then lazily applies them, possibly out of order, to the main relation forks. On crash it will replay these changes.
It doesn't matter if the on-disk status is inconsistent so long as the transaction logs are OK and the in-memory state in the buffer cache is fine.
You've lost the transaction logs, so your tables might be in a partially-replayed state with inconsistent/invalid data.
OK, so your data is a mess, what can you recover from it anyway?
Before you attempt to recover anything, make a complete copy of the original data. Do not change this. Any recovery attempts might make things worse, so you need to keep the originals.
Also, do not attempt any of this on a PostgreSQL install that contains anything you care about. initdb
a throw-away PostgreSQL instance for the work. Also, you must be using exactly the same PostgreSQL version.
The first thing I'd do would be CREATE TABLE
the original table structure. If you ever dropped columns or changed column types, you must repeat what you did exactly when you recreate the table. Once you're done, check the relfilenode that backs the table:
SELECT relname, relfilenode FROM pg_class WHERE relname = 'mytable';
You also need to find the associated TOAST table's relfileid.
Then I'd disconnect and stop PostgreSQL cleanly (pg_ctl stop -m smart
) and try copying the lost table over to the file with the relfilenode of the newly recreated table. If the original had lots of forks (1234.1
, 1234.2
, etc) you need to copy and rename all of them. You might be able to match up which table is which based on file size, or by looking at strings
for it.
Do the same for the TOAST table associated with the relation.
Then start PostgreSQL in standalone backend mode and attempt to read the table.
If you're really lucky it won't crash or ERROR
, and will just give you an empty table or bogus results.
At this point, you then need to use pg_dirtyread
to attempt to read the table data while ignoring transaction visibility. See the github repo.
It'd be quite nice to have a tool to dump a raw PostgreSQL database table standalone, but I don't think there's such a thing.
Others may have better suggestions about how to attempt recovery. It isn't something I've done, and I don't have time to test it out right now.
I suggest doing some more reading and research, see if anyone else has written about this already. Also post on pgsql-general@postgresql.org
for further advice.
Best Answer
PostgreSQL won't run off NTFS on Linux. Format the drive as ext4.
Frankly, putting a database on a USB key on a tablespace isn't usually a great idea. The tablespace is unreadable without the rest of the database, and the rest of the database is useless without the tablespace. So you're creating a fragile system with two points of failure.
It's a much better idea to instead move the whole PostgreSQL data directory onto the USB drive.