Postgresql – Postgres tablespace on mounted usb

postgresqltablespaces

I've got postgres 9.1 running on raspbian installed with apt-get.
I've got NTFS usb HDD mounted in /mnt/usb.
What I'd like to achieve is to create a tablespace pointing to /mnt/usb/postgres so I invoke:

CREATE TABLESPACE usbwd LOCATION '/mnt/usb/postgres';

and I get:

could not set permissions on directory "/mnt/usb/postgres": Operation
not permitted

which is understandable since the manual states that the target directory has to be empty (it is in my case) and owned by postgres linux user. That's what I have problem with and

sudo chown postgres:postgres /mnt/usb/postgres

changes nothing, meaning the ownership does not change. The /mnt/usb/postgres is owned by pi user, and I need to have it owned by postgres user. The /mnt/usb is also owned by pi user.
What am I missing ?

Best Answer

PostgreSQL won't run off NTFS on Linux. Format the drive as ext4.

Frankly, putting a database on a USB key on a tablespace isn't usually a great idea. The tablespace is unreadable without the rest of the database, and the rest of the database is useless without the tablespace. So you're creating a fragile system with two points of failure.

It's a much better idea to instead move the whole PostgreSQL data directory onto the USB drive.

Related Solutions

PostgreSQL Tablespaces – Is Tablespace Similar to Logical Volume?

Historically, PostgreSQL stored tables and indexes in individual files. Tablespaces are a means of placing multiple tables/indexes into a single file or group of related files in the same directory. Other database management systems use similar techniques, although tney can store multiple objects within a file. On PostgreSQL a tablespace is implemented as a directory that individual files are placed in, but the files are still one-per-database-object.

A table must reside in a single tablespace unless it is partitioned, in which case each partition must reside on a given tablespace. On some DBMS platforms you could add files to a tablespace as you suggest, although in the case case of PostgreSQL a tablespace is essentially just a directory for grouping files together. On systems where this is possible it could result in suboptimal I/O performance as individual disks could contain I/O hot spots. Probably, you would be better off setting multiple disks up in a RAID volume and placing the tablespace on that.

Depending on your workload you may wish to set up individual disk volumes with tablespaces on each volume and distribute parts of your database over the different volumes. The most trivial case is separating database logs onto a separate volume, but you could distribute data over multiple volumes for various reasons.

EDIT: Here is a link on PostgreSQL tablespaces.

Postgresql – How to access a old saved tablespace after reinstalling postgres

Recovering a tablespace if you've deleted the main data directory is difficult, because all the metadata about what's in the tablespace is gone. (Nor can you generally recover the main DB if you lose a tablespace.).

I really, really hope you had decent backups, because recovering your data will be difficult and time consuming. You'll need to learn a lot about PostgreSQL's innards.

Identifying relations (lost `pg_class`)

Pg's on-disk layout is minimalist, with almost all information about tables stored in the system catalogs, in tables like pg_class. That's necessary to properly support transactional DDL. The system catalogs are part of the main data directory.

There is a metapage for each table, but it won't tell you much.

The file layout of the tablespace is a tree of database-oids with relation-relfileid nodes within it. The mapping of relations relfileids to actual tables is stored ... in pg_class, in the main database.

So you have a collection of loose relation forks, with none of the metadata that tells you what they are. And it gets worse.

TOAST tables

As if that wasn't bad enough, lots of data is stored out-of-line in TOAST tables.

The only thing that connects TOAST tables to the main tables is the system catalogs.

Transaction visibility, lost `pg_clog` and `pg_controldata`

PostgreSQL records the status of transactions - which tx's are in progress, which ones committed, which rolled back, etc - in pg_clog. You guessed it, in the main database. The control file also resides in the main DB, and keeps track of things like the transaction ID wrap-around point.

Every tuple in a table has xmin and xmax fields, which together tell readers whether they should "see" a tuple when scanning the table. You've lost the records that give the transaction IDs in xmin and xmax meaning. So when you read your tables, you may get:

Duplicate rows where an UPDATE has marked the old row's xmax and created a new row.
Deleted rows
The results of rolled back transactions

Essentially, if you can read the table at all, it'll be a dirty read.

Unwritten changes and lost `pg_xlog`

PostgreSQL records changes to pg_xlog, then lazily applies them, possibly out of order, to the main relation forks. On crash it will replay these changes.

It doesn't matter if the on-disk status is inconsistent so long as the transaction logs are OK and the in-memory state in the buffer cache is fine.

You've lost the transaction logs, so your tables might be in a partially-replayed state with inconsistent/invalid data.

OK, so your data is a mess, what can you recover from it anyway?

Before you attempt to recover anything, make a complete copy of the original data. Do not change this. Any recovery attempts might make things worse, so you need to keep the originals.

Also, do not attempt any of this on a PostgreSQL install that contains anything you care about. initdb a throw-away PostgreSQL instance for the work. Also, you must be using exactly the same PostgreSQL version.

The first thing I'd do would be CREATE TABLE the original table structure. If you ever dropped columns or changed column types, you must repeat what you did exactly when you recreate the table. Once you're done, check the relfilenode that backs the table:

SELECT relname, relfilenode FROM pg_class WHERE relname = 'mytable';

You also need to find the associated TOAST table's relfileid.

Then I'd disconnect and stop PostgreSQL cleanly (pg_ctl stop -m smart) and try copying the lost table over to the file with the relfilenode of the newly recreated table. If the original had lots of forks (1234.1, 1234.2, etc) you need to copy and rename all of them. You might be able to match up which table is which based on file size, or by looking at strings for it.

Do the same for the TOAST table associated with the relation.

Then start PostgreSQL in standalone backend mode and attempt to read the table.

If you're really lucky it won't crash or ERROR, and will just give you an empty table or bogus results.

At this point, you then need to use pg_dirtyread to attempt to read the table data while ignoring transaction visibility. See the github repo.

It'd be quite nice to have a tool to dump a raw PostgreSQL database table standalone, but I don't think there's such a thing.

Others may have better suggestions about how to attempt recovery. It isn't something I've done, and I don't have time to test it out right now.

I suggest doing some more reading and research, see if anyone else has written about this already. Also post on pgsql-general@postgresql.org for further advice.

Best Answer

Related Solutions

PostgreSQL Tablespaces – Is Tablespace Similar to Logical Volume?

Postgresql – How to access a old saved tablespace after reinstalling postgres

Identifying relations (lost pg_class)

TOAST tables

Transaction visibility, lost pg_clog and pg_controldata

Unwritten changes and lost pg_xlog

OK, so your data is a mess, what can you recover from it anyway?

Related Question

Identifying relations (lost `pg_class`)

Transaction visibility, lost `pg_clog` and `pg_controldata`

Unwritten changes and lost `pg_xlog`