Postgresql – Optimize PostgreSQL server setting for extremely wide tables

performancepostgresqlpostgresql-9.4postgresql-performance

We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.

On this server we have quite atypical data – there are several tables with approx. 500 columns each, ranging from .5 – 2 milion rows. All columns are either of smallint or integer data type. The typical operations we do:

  • perform some aggregation function on one column, grouped by another column (this may well be also some window function)
  • read subset of variables (e.g. 20) and all rows
  • transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements

There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.

What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?

Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.

Best Answer

How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:

id|cat1|cat2|metric1|metric2|

Would become:

id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column

The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.

If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.