Postgresql – How to benchmark query parsing & planning in Postgres

benchmarkexplainpostgresql

I'm trying to benchmark the query planning and parsing steps in PG:

However, it seems that the query planner is using some sort of cache – only the first execution of EXPLAIN ANALYZE seems to report a high value in the Planning time section. Subsequently, it is less than 1ms. How do I turn this off?
The docs clearly state that the parsing time is not taken into account by EXPLAIN ANALYZE. Is there any other command/tool that will report the time it took to parse a query?

The Planning time shown by EXPLAIN ANALYZE is the time it took to generate the query plan from the parsed query and optimize it. It does not include parsing or rewriting.

Best Answer

It is not caching the actual prepared plans (unless you are using prepared statements). But it does cache all the metadata it needs to look up in order to come up with the plan. So the original planning time includes the time it took to read in all of that metadata.

If you wish to defeat the caching for experimental purposes, the simplest way would be to simply close psql and re-open it between every experiment.

For instrumentation, You can time each step of the parse-plan-execute process using log_statement_stats or its kin. You can also set client_min_messages = log so that those reported statistics show up directly on the psql screen, rather than having to go fetch them from the server log file. You will need to be a superuser to set either of those.

Index md5 hash as `uuid`

Since it turns out that your value column is too big to be indexed and you decided to use md5 hashes instead (which is a good solution):

Character varying index overhead & length limit

Consider storing the md5 hash as uuid, which would be most efficient (smaller, faster storage). You can just cast the result to uuid:

md5(value)::uuid

Details in the last chapter of this answer:

Convert hex in text representation to decimal number

Your index would look like this:

CREATE INDEX eav_value_special_idx
ON eav_value_text_data (cast(md5(value) AS uuid), attribute);

Note the explicit cast() syntax in the index definition:

How to create an index on an integer json property in postgres

Your query can still use shorthand syntax:

SELECT count(*)
FROM   eav_value_text_data v 
JOIN   eav_attribute a ON a.id = v.attribute 
WHERE  a.data_type = 11 
AND    a.name = 'id' 
AND    md5(v.value)::uuid = md5('rs145368920')::uuid;

Postgresql – Postgres query cache analyzing table basis

Clearing up confusion about `IMMUTABLE`

So, for the next time pets_get_all(1337) is executed, the query will be fetched from the cache, the inner functions won't be executed and thus.

That's not true. In fact, even in the same query the result isn't typically "cached": which as far as I know isn't a correct description of what's going on. It's more of a planner reference trick. There is certainly no transactional memoization going on here. These are two different concepts and there is nothing about a function call especially one which may be inlined that implies what you're assuming.

IMMUTABLE and even STABLE does imply that the query-plan itself is cached. This means that future calls within the same transaction will not be re-planned.

CREATE FUNCTION stable_foo()
RETURNS void
AS $$
  SELECT pg_sleep(1);
$$
LANGUAGE sql
STABLE;


CREATE FUNCTION immutable_foo()
RETURNS void
AS $$
  SELECT pg_sleep(1);
$$
LANGUAGE sql
IMMUTABLE;

Just looking at the timing:

\timing

-- these will take 2 seconds.
SELECT * FROM immutable_foo() AS a, immutable_foo() AS b;
SELECT immutable_foo(), immutable_foo() FROM ( VALUES (1) ) AS t(x);
SELECT immutable_foo()::text || immutable_foo()::text;

-- these will take 1 second.
SELECT immutable_foo() FROM ( VALUES (1),(2) ) AS t(x);

-- but declared as `STABLE` like foo(), it takes two seconds.
SELECT stable_foo() FROM ( VALUES (1),(2) ) AS t(x);

So only under specific circumstances in the same query will a function call get optimized out.. However, you'll never find a situation where this happens across statement boundaries..

As far as I know, the contexts for which an immutable function is optimized away are undocumented. Historically, this is even something experienced folks have struggled with and there is a general try it and see mentality as to whether or not the optimization takes effect.

BEGIN;
  SELECT bar();
  SELECT bar();
END;

I prefer to think of IMMUTABILITY in terms of what it permits, things such as (a) functional indexes, (b) functional index predicates (c) check constraints and the like. They all require a function to return the same arguments given the same inputs to make sense -- and they'll check that they're given an immutable function. But, it doesn't follow that there is memoization that operates in the transaction, session, or server.

PgPool 2

If you do need an actually query-cache you should consider checking out PgPool 2 which is the industry-method of doing this.

It's essentially middleware (or a forward-cache depending on your age).

As it applies to you

Writing a system to exploit undocumented query optimizations is a bad idea. You're better of not doing this at all, stepping back and just using SQL. Sure, the results of a complex query are not cached. But the buffered heap fetches and plans are. If it's in the same transaction it's working off the same data.

I would find weak links in your application and ask questions about how to make them faster. What you're doing seems something akin to pre-optimization.

Best Answer

Related Solutions

PostgreSQL – Poor Query Performance Due to Recheck Condition on Index Scan

Index md5 hash as uuid

Postgresql – Postgres query cache analyzing table basis

Clearing up confusion about IMMUTABLE

PgPool 2

As it applies to you

Related Question

Index md5 hash as `uuid`

Clearing up confusion about `IMMUTABLE`