Postgresql – Tuning postgresql.conf when benchmark

benchmarkenterprisedbpostgresql

Description:

We build a new database server Enterprisedb v9.3 (PostgreSQL commercial), so we use JMX to benchmark it with 1000 users. Here is our detailed information:

Operating System: Centos 6 x64
RAM: 8 GB
CPU: 2 CPU, Intel(R) Xeon (R) CPU @ 2.50 GHz 
User quantity: 1000

Query to benchmark:

select count(distinct(a.id)) 
from table_a a 
join table_b on a.id = b.fk_id  (*)

Running (*) before benchmark : 100 ms -> 200 ms
Running (*) during benchmark : 3000 ms -> 180000 ms (increase 30 -> 900 times)
CPU is used by benchmark: ~ 95 %
RAM is used by benchmark: ~ 3.5 GB

postgresql.conf file:

max_connections = 2000
shared_buffers = 5000MB 
temp_buffers = 32MB 
work_mem = 5MB  
maintenance_work_mem = 1MB
effective_cache_size = 1000MB
checkpoint_segments = 128
edb_enable_icache = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
edb_dynatune = 90
edb_dynatune_profile = oltp

Questions:

If resources do not change, how we can improve our configuration? (can we edit parameters in postgresql.conf?)
Can we use pgbench to benchmark with query (*) above?

Best Answer

Yes, you can run those queries in pgbench. Look for the parts talking about custom scripts:

http://www.postgresql.org/docs/9.4/static/pgbench.html

The max_connections, and shared_buffers seem really high for the server you have. 25% of RAM for shared_buffers would be more reasonable, and getting a connection pooler like pgbouncer to handle lots of connections would be much more productive.

You might find this useful as well:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Hope that helps. =)

Related Solutions

Postgresql – Running a TPC-C Benchmark Without sleep() i.e. key-in + think time

Apologies for the late response to this, but there is a key reason as to why you are not observing an expected smooth behavior: You have a client-server model violation when you eliminate think time.

The client server model is built upon a foundation that not all user are using system resources at the same time. This is different than the mainframe model with terminals where all users are being serviced at the same time by the core host. With client-server distributed processing exists and there is a delay between requests during which the resources are free'd to service the needs of other inbound users. Normally when you place a host under load with think time these free internals tend to become smaller and smaller and as load increases you eventually reach a tipping point of no free resources to service requests.

By eliminating the think time you eliminate the free resource window to service additional requests and your tests become invalid as a predictor in real world performance. If you ever hear a performance tester tell you that they have eliminated think time and every one user is equal to n real world users, fire them on the spot.

PostgreSQL Benchmarking – How to Benchmark a PostgreSQL Query

The widely used tool is the SQL command EXPLAIN ANALYZE, possibly with more options for more details in the answer. That outputs the query plan with the planner estimates plus actual execution times.

Why would you want to clear the cache? The generally more likely use case is that the cache is populated. If you still want to go that route, here is a related answer on SO.

Not resetting the cache, here are two simple ways to test with many iterations:

Simple UDF

EXPLAIN ANALYZE
SELECT f_myfunc(g) FROM generate_series (1,1000) AS t(g);

Or with random input - random numbers between 0 and 5000 in the example:

EXPLAIN ANALYZE
SELECT f_myfunc((random()*5000)::int) FROM generate_series (1,1000) AS t(g);

Or with a real life table:

EXPLAIN ANALYZE
SELECT f_myfunc(my_column) FROM my_tbl;   -- LIMIT n

More complex functions / queries

CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
   i int;
BEGIN

FOR i IN 1 .. $1 LOOP
    EXECUTE sql;  -- not safe against SQLi!
END LOOP;

END
$func$ LANGUAGE plpgsql

Call:

EXPLAIN ANALYZE
SELECT f_test(100, $x$SELECT * from MADLIB.gp('mock3', '{x1, x2, x3}', '{y1}', 100,20, 3)$x$

Careful: The query is actually executed!
Careful: Not fit for public use. Possible SQL injection.

Again, you can use random parameters if needed. Possibly with the USING clause of EXECUTE.