Postgresql – How to improve performance on PostgreSQL when using multiple concurrent processes

concurrencypostgresql

Seeing query performance which is not satisfactory on our Python application, which runs several processes that use SQLAlchemy core to access a PostgreSQL 9.2 database. We may have around 100 – 200 or more concurrent processes executing queries against the database.

We have no connection pooling systems like pgpoolII or pgbounce currently installed.

I have tested a query which can take in the order of 1-10 seconds from our application, which when run over psql will take at most 20 ms.

We believe this issue is related to concurrent connections particularly because of the behavior described above and because we see these performance hits when accessing different tables in the system. However, we are not sure of this and could very readily accept alternate suggestions or solutions.

What is the usual way of handling concurrent connections from multiple processes against a single database server, in a way that will enable queries to execute quickly?

Best Answer

You sort of answer your own question when you say you have no pooling but...

This is not an answer out of the box, with all client/db stuff you may need to do some work to determine exactly what is amiss

backup postgresql.conf changing

log_min_duration_statement to 0 
log_destination = 'csvlog'              # Valid values are combinations of      
logging_collector = on                # Enable capturing of stderr and csvlog 
log_directory = 'pg_log'                # directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,        
debug_print_parse = on
debug_print_rewritten = on
debug_print_plan output = on
log_min_messages = info (debug1 for all server versions prior to 8.4)

Stop and restart your database server ( reload may not pick up the changes ) Reproduce your tests ensuring that the server time and client times match and that you record the start times etc.

copy the log file off an import into editor of your choice (excel or another spreadsheet can be useful for getting advance manipulation for sql & plans etc)

now examine the timings from the server side and note:

is the sql reported on the server the same in each case

if the same you should have the same timings

is the client generating a cursor rather than passing sql

is the query arriving on the server when you believe it should do

is one driver doing a lot of casting/converting between character sets or implicit converting of other types such as dates or timestamps.

and so on

The plan data will be included for completeness, this may inform if there are gross differences in the SQL submitted by the clients.