Postgresql – How to figure out what planner cost constants to use in Postgres

database-tuningexecution-planperformancepostgresql

I'm using Postgres 8.4 right now. Performance has started to become an issue as our tables have grown in size and our queries in complexity, so I've started to look into some performance tuning, but I am not an expert at all in this stuff.

I've noticed that the manual mentions a few times that a good way to improve performance is to use better cost constants for the query planner, but it also says that there is no simple way to figure out what cost constants to use.

I think that the constants might be problematic right now, since estimated costs don't seem to be a steady multiple of actual execution time – it varies from 30x to about 600x, even after running VACUUM ANALYZE recently. (I don't know whether this is a valid way to check try to see if the constants are set well, correct me if I'm wrong on that)

So, I'd like to set better constants for the query planner. How do I go about doing this? Should I just tweak things up and down at random until it seems faster, or is there anything I should do that's more formal? Are there any guidelines based on hardware or anything?

By the way, if the answer is "Don't worry about it, other things will be way more important to improve first," that's fine – for my actual real-world case, I'll work on other stuff. But for the sake of others, it would still be good to know how one would improve the constants if everything else has been improved.

Best Answer

First, the costs are not supposed to directly relate to the execution time. They're strictly relative; a plan which costs more should take longer to actually execute. You can adjust sequential_page_cost in order to "tune" costs so that they're closer to milleseconds of execution, but IMHO that's a waste of time.

For 99% of users, there's only three cost constants they care about at all:

effective_cache_size: set this to 75% of available RAM on your system. Done.

random_page_cost: if you're using high-end SAN/DAS, lower this to 2.0. If you're using SSDs or on Amazon, lower it to 1.5. Otherwise, leave it alone.

effective_io_concurrency: if you're on Linux and have a drive array, raise this to the number of drives (or mirrored pairs) the database is stored on, maximum 4.

Tinkering with the other cost constants is not recommended unless you have a full performance test setup to determine if the result is really beneficial or not.