I have a really busy function that I need to optimize the best way I can. This function is just a nested select statement that is requested several times a sec by a legacy application.
The indexes are in place but I've noticed that it is only used after the first execution of the function. I think that the problem is that Postgres creates a general execution plan because of a parameter that it is highly exclusive in the majority of the cases, but it can be not as good sometimes.
When I test with EXPLAIN ANALYZE
after the first execution, the query runs really fast but the app sessions call the function only once and then are terminated. I need that the first execution use the actual optimized plan. Anyone can help?
We tried messing around with the connector driver that manage the connection pooling to issue a DISCARD TEMP
instead of DISCARD ALL
, so it could keep the cached plan of the sessions and the performance went through the roof, but I don't want to do that in a production environment.
We are on a Postgres 9.4 running on CentOS 6. I've tried running as a SQL function but it didn't help, it was actually faster as a plpgsql func. Here is the function code:
CREATE OR REPLACE FUNCTION public.ap_keepalive_geteqpid_veiid(
IN tcbserie bigint,
IN protocolo integer)
RETURNS TABLE(eqpid integer, veiid integer, tcbid integer, veiplaca character varying, veiproprietariocliid integer, tcbtppid integer, tcbversao character, veirpmparametro double precision, tcbconfiguracao bigint, tcbevtconfig integer, veibitsalertas integer, sluid integer, harid integer) AS
$BODY$
BEGIN
RETURN QUERY
SELECT teqp.eqpID,
teqp.eqpveiID AS veiID,
tcb.tcbID,
tvei.veiPlaca,
tvei.veiProprietariocliID,
tcb.tcbtppID,
tcb.tcbVersao,
tvei.veiRPMParametro,
COALESCE(COALESCE(NULLIF(tcb.tcbConfiguracao, 0), tcc.clcConfiguracaoBitsVeic), 0) AS tcbConfiguracao,
COALESCE(tcb.tcbevtConfig, 0) AS tcbevtConfig,
COALESCE(tvei.veiBitsAlertas, 0) AS veiBitsAlertas,
COALESCE(tvei.veisluID, 0) AS sluID,
COALESCE(tcb.tcbharID, 0) AS harID
FROM TabEquipamento teqp
INNER JOIN TabPacoteProduto tpp ON teqp.eqptppID = tpp.tppID
INNER JOIN TabComputadorBordo tcb ON teqp.eqptcbID = tcb.tcbID
INNER JOIN TabVeiculos tvei ON teqp.eqpveiID = tvei.veiID
LEFT JOIN TabCliente tcli ON tcli.cliid = tvei.veiProprietariocliID
LEFT JOIN TabClienteConfig tcc ON tcc.clcCliID = tcli.cliID
WHERE tcb.tcbserie = $1
AND teqp.eqpAtivo = 1
AND tpp.tppIDProtocolo = $2
AND tvei.veiBloqueioSinal = 0;
END
$BODY$
LANGUAGE plpgsql VOLATILE COST 100 ROWS 1;
Execution plan in the first execution:
"Function Scan on ap_keepalive_geteqpid_veiid (cost=0.25..0.26 rows=1 width=116) (actual time=3.268..3.268 rows=1 loops=1)"
"Planning time: 0.032 ms"
"Execution time: 3.288 ms"
Second execution:
"Function Scan on ap_keepalive_geteqpid_veiid (cost=0.25..0.26 rows=1 width=116) (actual time=0.401..0.402 rows=1 loops=1)"
"Planning time: 0.058 ms"
"Execution time: 0.423 ms"
Edit: Added an auto-explain output of the function with a unexpected result (at least for me). The auto-explain claims that postgres executed the function with the desired plain with only 0.230 ms but the function itself took 4.057 ms. I don't know if this is accurate.
< 2015-12-14 18:10:02.314 BRST >LOG: duration: 0.234 ms plan:
Query Text: SELECT teqp.eqpID,
teqp.eqpveiID AS veiID,
tcb.tcbID,
tvei.veiPlaca,
tvei.veiProprietariocliID,
tcb.tcbtppID,
tcb.tcbVersao,
tvei.veiRPMParametro,
COALESCE(COALESCE(NULLIF(tcb.tcbConfiguracao, 0), tcc.clcConfiguracaoBitsVeic), 0) AS tcbConfiguracao,
COALESCE(tcb.tcbevtConfig, 0) AS tcbevtConfig,
COALESCE(tvei.veiBitsAlertas, 0) AS veiBitsAlertas,
COALESCE(tvei.veisluID, 0) AS sluID,
COALESCE(tcb.tcbharID, 0) AS harID
FROM TabComputadorBordo tcb
INNER JOIN TabEquipamento teqp ON teqp.eqptcbID = tcb.tcbID
INNER JOIN TabPacoteProduto tpp ON teqp.eqptppID = tpp.tppID
INNER JOIN TabVeiculos tvei ON teqp.eqpveiID = tvei.veiID
LEFT JOIN TabCliente tcli ON tcli.cliid = tvei.veiProprietariocliID
LEFT JOIN TabClienteConfig tcc ON tcc.clcCliID = tcli.cliID
WHERE tcb.tcbserie = $1
AND teqp.eqpAtivo = 1
AND tpp.tppIDProtocolo = $2
AND tvei.veiBloqueioSinal = 0
Nested Loop Left Join (cost=1.29..18.65 rows=1 width=75) (actual time=0.226..0.230 rows=1 loops=1)
Join Filter: (tcc.clccliid = tcli.cliid)
Rows Removed by Join Filter: 3
-> Nested Loop Left Join (cost=1.29..17.57 rows=1 width=75) (actual time=0.205..0.209 rows=1 loops=1)
-> Nested Loop (cost=1.01..17.26 rows=1 width=71) (actual time=0.200..0.203 rows=1 loops=1)
-> Nested Loop (cost=0.72..16.80 rows=1 width=43) (actual time=0.097..0.098 rows=1 loops=1)
-> Nested Loop (cost=0.58..16.63 rows=1 width=47) (actual time=0.079..0.080 rows=1 loops=1)
-> Index Scan using ix_tabcomputadorbordo_tcbserie on tabcomputadorbordo tcb (cost=0.29..8.31 rows=1 width=35) (actual time=0.046..0.046 rows=1 loops=1)
Index Cond: (tcbserie = $1)
-> Index Scan using ix_tabequipamento_eqptcbid_eqpativo_eqptppid_eqpveiid on tabequipamento teqp (cost=0.29..8.31 rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=1)
Index Cond: ((eqptcbid = tcb.tcbid) AND (eqpativo = 1))
-> Index Only Scan using ix_tabpacoteproduto_tppidprotocolo on tabpacoteproduto tpp (cost=0.14..0.16 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: ((tppidprotocolo = $2) AND (tppid = teqp.eqptppid))
Heap Fetches: 1
-> Index Scan using pk_tabveiculos on tabveiculos tvei (cost=0.29..0.45 rows=1 width=32) (actual time=0.100..0.101 rows=1 loops=1)
Index Cond: (veiid = teqp.eqpveiid)
Filter: (veibloqueiosinal = 0)
-> Index Only Scan using pk_tabcliente on tabcliente tcli (cost=0.28..0.30 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: (cliid = tvei.veiproprietariocliid)
Heap Fetches: 1
-> Seq Scan on tabclienteconfig tcc (cost=0.00..1.03 rows=3 width=8) (actual time=0.014..0.015 rows=3 loops=1)
< 2015-12-14 18:10:02.314 BRST >CONTEXTO: função PL/pgSQL ap_keepalive_geteqpid_veiid(bigint,integer) linha 4 em RETURN QUERY
< 2015-12-14 18:10:02.314 BRST >LOG: duration: 4.057 ms plan:
Query Text: SELECT * FROM ap_keepalive_geteqpid_veiid (tcbSerie := 8259492, protocolo:= 422);
Best Answer
I cleaned up and simplified a few minor details. That shouldn't change much for performance, yet. But the added
SET join_collapse_limit = 1
might:Notes
COALESCE
can take multiple parameters, no need to nest:The
CASE
expression I ended up using above should be a bit faster, yet.character
is suspicious as data type. It's the same aschar(1)
, I assume you are aware of that.Inside PL/pgSQL, column aliases that are not referenced in the same query are just for documentation. Only the names in the
RETURNS TABLE
clause are visible outside the function.COST 100
is the default for a user-defined function and probably way off for your case. 10000 is probably a better estimate, but unless you nest this function in an outer query, this has hardly any effect.I removed the parameter defaults (as discussed) and made the function
STRICT
, since the query would return nothing for any NULL input anyway.I simplified your table aliases and formatted some more to make it easier to read and play with. That last bit is largely a matter of taste and style.
Answer to question
As to your actual question: There are no (direct) hints for the query planner (optimizer) in PostgreSQL like there are in other RDBMS. Details in the Postgres Wiki on "OptimizerHintsDiscussion".
PL/pgSQL works with prepared statements internally. It will re-plan queries inside the function body with the given input parameters for the first couple of calls per session. Only if it finds that specific plans don't perform better than a generic plan it will then switch to a generic plan and keep that, which saves some overhead.
Details:
However, there are a couple of settings you can adjust. In particular, if you know the best query plan, you can force Postgres to take the order of joins in the
FROM
clause as given and not try to reorder (which can get expensive for many tables - and you have 6 of them), by setting thejoin_collapse_limit
. This will reduce the cost of planning the query. If done right, it will make the first couple of calls faster. If you mess it up, performance will suffer, of course.You could put a
SET LOCAL
as first command:Better yet, declare it as an attribute of the function itself like I did above. The effect of
SET LOCAL
in the function body would last till the end of the transaction, but, per documentation:Obviously, you need to get the sequence of joins in your
FROM
clause right yourself. And it has to be good for all possible combinations of parameters. Postgres will not try to optimize. (TheSTRICT
modifier simplifies a bit, because NULL values are excluded now.)Put tables with the most selective predicates first.
WARNING: Be aware that such optimizations may turn from helpful to obstructive after upgrades to your Postgres version or any major changes in your DB.
Related answer with more details:
Aside
I assume you are aware that there are a couple other possible effects that can make the first call slower than subsequent calls. Like populating cache memory and other things: