Since the selective predicate on the big table eav_value_text_data
big table is v.value = 'rs145368920'
, you need an index on value
more than anything else. The index on attribute
is hardly relevant - only in combination with the first to allow index-only scans if possible:
CREATE INDEX eav_value_text_data_val_att_idx ON eav_value_text_data (value, attribute);
This should make a huge difference.
value
should be the first column in the index. More:
Index md5 hash as uuid
Since it turns out that your value
column is too big to be indexed and you decided to use md5 hashes instead (which is a good solution):
Consider storing the md5 hash as uuid
, which would be most efficient (smaller, faster storage). You can just cast the result to uuid
:
md5(value)::uuid
Details in the last chapter of this answer:
Related:
Your index would look like this:
CREATE INDEX eav_value_special_idx
ON eav_value_text_data (cast(md5(value) AS uuid), attribute);
Note the explicit cast()
syntax in the index definition:
Your query can still use shorthand syntax:
SELECT count(*)
FROM eav_value_text_data v
JOIN eav_attribute a ON a.id = v.attribute
WHERE a.data_type = 11
AND a.name = 'id'
AND md5(v.value)::uuid = md5('rs145368920')::uuid;
Clearing up confusion about IMMUTABLE
So, for the next time pets_get_all(1337) is executed, the query will be fetched from the cache, the inner functions won't be executed and thus.
That's not true. In fact, even in the same query the result isn't typically "cached": which as far as I know isn't a correct description of what's going on. It's more of a planner reference trick. There is certainly no transactional memoization going on here. These are two different concepts and there is nothing about a function call especially one which may be inlined that implies what you're assuming.
IMMUTABLE
and even STABLE
does imply that the query-plan itself is cached. This means that future calls within the same transaction will not be re-planned.
CREATE FUNCTION stable_foo()
RETURNS void
AS $$
SELECT pg_sleep(1);
$$
LANGUAGE sql
STABLE;
CREATE FUNCTION immutable_foo()
RETURNS void
AS $$
SELECT pg_sleep(1);
$$
LANGUAGE sql
IMMUTABLE;
Just looking at the timing:
\timing
-- these will take 2 seconds.
SELECT * FROM immutable_foo() AS a, immutable_foo() AS b;
SELECT immutable_foo(), immutable_foo() FROM ( VALUES (1) ) AS t(x);
SELECT immutable_foo()::text || immutable_foo()::text;
-- these will take 1 second.
SELECT immutable_foo() FROM ( VALUES (1),(2) ) AS t(x);
-- but declared as `STABLE` like foo(), it takes two seconds.
SELECT stable_foo() FROM ( VALUES (1),(2) ) AS t(x);
So only under specific circumstances in the same query will a function call get optimized out.. However, you'll never find a situation where this happens across statement boundaries..
As far as I know, the contexts for which an immutable function is optimized away are undocumented. Historically, this is even something experienced folks have struggled with and there is a general try it and see mentality as to whether or not the optimization takes effect.
BEGIN;
SELECT bar();
SELECT bar();
END;
I prefer to think of IMMUTABILITY
in terms of what it permits, things such as (a) functional indexes, (b) functional index predicates (c) check constraints and the like. They all require a function to return the same arguments given the same inputs to make sense -- and they'll check that they're given an immutable function. But, it doesn't follow that there is memoization that operates in the transaction, session, or server.
PgPool 2
If you do need an actually query-cache you should consider checking out PgPool 2 which is the industry-method of doing this.
It's essentially middleware (or a forward-cache depending on your age).
As it applies to you
Writing a system to exploit undocumented query optimizations is a bad idea. You're better of not doing this at all, stepping back and just using SQL. Sure, the results of a complex query are not cached. But the buffered heap fetches and plans are. If it's in the same transaction it's working off the same data.
I would find weak links in your application and ask questions about how to make them faster. What you're doing seems something akin to pre-optimization.
Best Answer
It is not caching the actual prepared plans (unless you are using prepared statements). But it does cache all the metadata it needs to look up in order to come up with the plan. So the original planning time includes the time it took to read in all of that metadata.
If you wish to defeat the caching for experimental purposes, the simplest way would be to simply close
psql
and re-open it between every experiment.For instrumentation, You can time each step of the parse-plan-execute process using log_statement_stats or its kin. You can also
set client_min_messages = log
so that those reported statistics show up directly on the psql screen, rather than having to go fetch them from the server log file. You will need to be a superuser to set either of those.