PostgreSQL SSD Performance – Maximizing Performance on SSDs in PostgreSQL 9.3

postgresqlpostgresql-9.3

I will have a huge PostgreSQL 9.3 database with many tables with more than 100M entries per table. This database will be basically be read-only (once I fill all the necessary tables and build the indexes no more write operations on the DB) and single-user access (run and benchmark multiple queries from localhost), since the DB will be used only for research purposes. Queries will always use JOIN on integer DB fields.

I will probably buy a SSD (256-512GB) for this purpose. I have not used an SSD for a DB before, so is there anything I should be afraid of? Can I put the entire DB on the SSD, or just the indexes? Is there any particular advice / tutorial required for tuning PostgreSQL for SSDs? Note, that I have a good workstation with an i7 and 32Gb of RAM, so perhaps you can offer some advice there too.

Best Answer

so is there anything I should be afraid of?

Not having backups. Like any storage device, it can die. Keep backups.

If the data load is going to take ages, I'd back up the read-only db once I'd done the data load, by stopping it and copying it. That way if something went wrong it'd be easier to re-create later.

Can I put the entire DB on the SSD, or just the indexes?

If it fits, store the whole DB.

If it doesn't, put a tablespace on the SSD and use it to store the indexes and as many of the heavily queried tables as will fit.

Is there any particular advice / tutorial required for tuning PostgreSQL for SSDs?

Most of the benefits of SSDs are for OLTP write loads. The main advantage for read-only loads is fast seeks, and slardiere has covered that.

You might want to set effective_io_concurrency = 5 or something to reflect the fact that SSDs can do fast, heavily pipelined random reads ... but it only affects bitmap index scans, and in practice random_page_cost already incorporates that.

For a read-only load it doesn't make a ton of difference.

For the initial data load, see:

Note, that I have a good workstation with an i7 and 32Gb of RAM, so perhaps you can offer some advice there too.

Set a big maintenance_work_mem for the data load. I'd use at least 8GB.

Set a big work_mem for the querying work. Appropriate size depends a bit on query complexity. Start with 500MB and go up from there.

Bump up your checkpoint_segments (massively) for the initial data load.

Remember to disable VM overcommit! (see the PostgreSQL manual: http://www.postgresql.org/docs/current/static/kernel-resources.html)