First of all, you must have a bufferpool that matches the page size of the tablespace. By that token if the temporary tablespace is the only one with the 32K page size then it will have the bufferpool for its exclusive use.
If you have other tablespaces with the 32K page size, only monitoring the system performance will tell you if there may be a benefit from a separate temp space bufferpool.
You can use select * from sysibmadm.mon_bp_utilization
to look at the bufferpool hit ratios and select * from table (mon_get_bufferpool(NULL,NULL))
to check page cleaner activity (POOL_NO_VICTIM_BUFFER
and POOL_DRTY_PG_STEAL_CLNS
should ideally be 0). If you see that a drop in the bufferpool hit ratio or high dirty page contention coincides with the temporary space use (a spike in POOL_TEMP_DATA_L_READS
in select * from table (mon_get_tablespace(NULL,NULL))
), creating a separate bufferpool of an appropriate size for the temporary tablespace in question might be useful.
I don't think you can do what you want here - certainly not without changes to PostgreSQL.
Tablespaces are not an independent datastore, and cannot be exchanged between different PostgreSQL instances. Making them so would be a huge change to PostgreSQL's core, with significant impacts on transactional DDL support, etc.
See this related answer I wrote on the topic, explaining some of the issues with attaching a tablespace to a different PostgreSQL instance: How do I access a old saved tablespace after reinstalling postgres?
If you were to create the entire database in a different tablespace then you'd be able to avoid the issues with pg_class
and TOAST
tables ... but you'd still have to deal with the problems with the pg_clog
, pg_xlog
, transaction visibility, etc.
pg_xlog
issues could potentially be solved if you were able to copy the tablespace at a point after a checkpoint, before any further writes were performed, so the tablespace was "clean". PostgreSQL doesn't currently offer any assistance to do this, but I think it'd be possible to add it with moderate code changes (at the cost of a stall in the production DB when you flushed it for the checkpoint and snapshotted it) by taking an exclusive lock on the internal xlog insert log. Possibly do-able even as an extension function.
The pg_clog
and pg_control
problem is a lot harder. Transaction IDs are global across the entire PostgreSQL instance. So your tablespace (or whatever other collection of tables less than an entire instance) will have transaction IDs in the future, and/or with different committed/rolled-back statuses in the local pg_clog
to what they had in the original pg_clog
. Badness ensues. The only way I can think of to deal with this is to VACUUM FREEZE
all tables in the database, which will set their xmin
and xmax
to FrozenTransactionId
, an xid defined in the source code that's always visible for all transactions. However, doing this without having unfrozen xids added in parallel requires that you take an EXCLUSIVE
lock (blocking INSERT
, UPDATE
and DELETE
) on all tables in the database for as long as it takes to VACUUM FREEZE
and then snapshot the tablespace.
So. Maybe it's possible to do what you want, but there's no way it's possible without production impact. You'd have to basically rewrite PostgreSQL to store per-database pg_clog
and pg_xlog
, keep a per-database pg_control
, and use per-database transaction IDs. Then add support for keeping these in a non-default tablespace. Once you'd done that, you'd have the exciting job of making this work with streaming replication and WAL-shipping based PITR, which currently relies on a single WAL stream as an ordered time-line of events. And you'd have to deal with the extra fsync()
costs, etc, of per-DB WAL. Finally, you'd need to figure out how you were going to make this work with PostgreSQL's shared relations, like pg_database
, pg_authid
, etc, that live in the pg_global
tablespace - they are transactional tables and participate in transactions along with tables in the current database, and you'd have to find a way to keep that working when you now had two separate transaction IDs (global and per-database) that had to commit or rollback atomically.
This is quite an interesting problem, and I suggest raising it on the pgsql-general mailing list. Link back here for context (and so people can explain all the things I got wrong in the above explanation).
Personally I'd really like to see PostgreSQL able to handle individual databases or tablespaces more independently, because I'd really like to be able to use replication to stream only a particular DB, or a subset of a DB. Right now that's just not possible, meaning you must do frustrating things like log-ship change data for high churn transaction tables and low-churn important archive tables together, or separate them into completely separate Pg instances. It doesn't look easy to do; the new logical replication and BDR feature should help, though.
Best Answer
Historically, PostgreSQL stored tables and indexes in individual files. Tablespaces are a means of placing multiple tables/indexes into a single file or group of related files in the same directory. Other database management systems use similar techniques, although tney can store multiple objects within a file. On PostgreSQL a tablespace is implemented as a directory that individual files are placed in, but the files are still one-per-database-object.
A table must reside in a single tablespace unless it is partitioned, in which case each partition must reside on a given tablespace. On some DBMS platforms you could add files to a tablespace as you suggest, although in the case case of PostgreSQL a tablespace is essentially just a directory for grouping files together. On systems where this is possible it could result in suboptimal I/O performance as individual disks could contain I/O hot spots. Probably, you would be better off setting multiple disks up in a RAID volume and placing the tablespace on that.
Depending on your workload you may wish to set up individual disk volumes with tablespaces on each volume and distribute parts of your database over the different volumes. The most trivial case is separating database logs onto a separate volume, but you could distribute data over multiple volumes for various reasons.
EDIT: Here is a link on PostgreSQL tablespaces.