Postgresql – Tuning Postgres for Single connection use? Or is postgres wrong tool

concurrencyconfigurationdatabase-tuningpostgresql

Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions?

I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.).

All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections. Anyone have basic rules of thumb for someone working on a database alone for data manipulation?

UPDATE:
– This also means almost no writing, except to creation of new tables based on selections from the main table. (Apparently that's important — thanks Erwin!)

Updated 2:
– I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important.

Best Answer

If possible i would do following : Define 2 tablespaces, one for data , one for result tables. work_mem and maintenance_work_mem to 1Gb and 2Gb , shared buffer to high only if you run two or more queries same time to same set of data, if you don't do that then shared memory will be more or less wasted.

Biggest problems you will have is that Postgresql uses only one core per session , so easiest way to get more done is just run several queries on different sessions. Second problem will be disk i/o, several fast disks + different tablespaces help. Third is RAM settings. I would configure work_mem so that sessions*work_mem is somewhere at 10GB and keep maintenance_work_mem on 1-2GB. maintenance_work_mem can be set from query like this set maintenance_work_mem='6GB'; so you can set it higher when you do maintenance. You can also set work_mem like set work_mem='6GB' for session.

Table definitions also affect space used see: https://stackoverflow.com/questions/2966524/calculating-and-saving-space-in-postgresql . Probably not needed, but with 600GB of data scientific data it may help alot.