PostgreSQL – How to Tell if Database Tuning is Fruitful

database-tuningpostgresql

As a DBA, I regularly observe my database, tune up some queries (either getting the programmers to re-write it or add index or update statistics if that's what is needed, etc). I also tune up the server options, etc.

But what I'm not happy with is that I don't have good metrics to say whether my work is worthwhile or not.

On one off queries optimization I can tell whether I got the query to run faster or not. I can use logs and pgfouine to detect slow queries.

But is there overall ways to quantify how well the database is running, metrics to keep track of that will tell me if I'm generally improving things or not really making a difference, etc?

I'm looking for metrics captured at the database level, not at the application level.

Best Answer

How well the database is running is just how well SQL statements are running.

  • Pick a representative sample of SQL statements that hit each database on your server. Picking a sample is easy; picking a representative sample isn't. Ideally, a representative sample should include some SELECT, INSERT, UPDATE, and DELETE statements. But it might not be practical to run INSERT, UPDATE, and DELETE statements against a production database.
  • Put them under version control.
  • Run them pseudo-randomly against the server. The goal is to get representative performance, not to get the fastest performance. The last time I did this, I didn't have much time to spend on this stuff, so I hoped that running the statements somewhat randomly would be representative. I expected that sometimes they'd hit the cache, and sometimes they wouldn't, just like user queries. How you execute these queries can bias your results; be careful and thoughtful.
  • Time the execution. Save the timings in a database.
  • Plot the timings on a control chart. Done correctly, this will show you baseline system performance, random variation around the baseline, and patterns that suggest that something might have changed. If you improve overall server performance, for example, expect the pattern to shift toward the lower control limit.
  • Think about the right time to change your control limits. (Same reference)