PostgreSQL – How Slow is Slow?

postgresql

I'm working with postgres for the first time. I have a lot of experience with small / medium size data analysis (i.e. things that fit in ram and can be analyzed in R, Stata, Matlab, etc.), but am now working with big data (300-750gb) for the first time.

As a result, I have no concept of how long things should be taking. I think my database is performing very poorly, but having never worked on these scales I don't really know.

So here's my question: even basic queries are taking me at least 8 hours on a 237gb table. Vacuum takes ~6 hours. And a query pulling out distinct pairs of values:

CREATE TABLE UserPairs AS
SELECT DISTINCT a, group_a, sum(quantity) FROM cdr GROUP BY (a, group_a) HAVING type = 'DATA' AND group_a IS NOT NULL;

ran for 8 hours before I aborted.

An attempt to build 4 hash indices over main columns ran for 24 hours then crashed.

Hardware: 3 cores, 12 gb ram Windows 8 server VM. (I know, but I don't have control over my hardware. Long story).

So basically: within an order of magnitude, how long should I expect basic queries to take in postgres for tables of this size?

And if this seems way off, how do I get more precise benchmarks? I'm running pgbench now, but can't find resources on how to interpret the results. This listserv exchange suggests there aren't any repositories of results…

  • 1.5 billion rows.
  • Settings are default, looks like work_mem = 1mb, maintanence_work_mem = 16mb.
  • IO System: it's a VM so the disk is just listed as a "VMware virtual disk SCSI disk device", but Performance Monitor says avg Disc sec/transfer is 0.010sec and average disk queue length is 2.16.

Best Answer

You have to first set expectations - a screen that does such and such activities should complete each action in 1 second and all actions in 5 seconds and so on. For example, a search screen should retrieve results in 3 seconds, the booking actions (ticket booking) should be completed in 30 seconds etc.

Then work towards meeting those targets. That is the "normal" performance you want. Now go about meeting those targets. The database may be your bottleneck, it may not be. To identify issues at the database side, try using a tool like pgbadger. That will tell you which queries are taking time.

By the way, 8 hours for a query is probably not acceptable under any circumstance. Try the tool pgtune and see if there is scope for optimizing parameters.