PostgreSQL Benchmarking – How to Benchmark a PostgreSQL Query

benchmarkpostgresql

I want to benchmark a query containing a user-defined function I have written for PostgreSQL. Are there any standard ways to run such a benchmark?

I know that timing can be turned on with \timing at the psql prompt, but ideally I'd like to have a script that takes care of everything automatically: running the query a few times, clearing PostgreSQL cache after each run (probably by restarting the PostgreSQL's service), and outputting the average running time (and memory used is a plus).

Best Answer

The widely used tool is the SQL command EXPLAIN ANALYZE, possibly with more options for more details in the answer. That outputs the query plan with the planner estimates plus actual execution times.

Why would you want to clear the cache? The generally more likely use case is that the cache is populated. If you still want to go that route, here is a related answer on SO.

Not resetting the cache, here are two simple ways to test with many iterations:

Simple UDF

EXPLAIN ANALYZE
SELECT f_myfunc(g) FROM generate_series (1,1000) AS t(g);

Or with random input - random numbers between 0 and 5000 in the example:

EXPLAIN ANALYZE
SELECT f_myfunc((random()*5000)::int) FROM generate_series (1,1000) AS t(g);

Or with a real life table:

EXPLAIN ANALYZE
SELECT f_myfunc(my_column) FROM my_tbl;   -- LIMIT n

More complex functions / queries

CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
   i int;
BEGIN

FOR i IN 1 .. $1 LOOP
    EXECUTE sql;  -- not safe against SQLi!
END LOOP;

END
$func$ LANGUAGE plpgsql

Call:

EXPLAIN ANALYZE
SELECT f_test(100, $x$SELECT * from MADLIB.gp('mock3', '{x1, x2, x3}', '{y1}', 100,20, 3)$x$

Careful: The query is actually executed!
Careful: Not fit for public use. Possible SQL injection.

Again, you can use random parameters if needed. Possibly with the USING clause of EXECUTE.