PostgreSQL – Best Practices for Function Performance

best practicesplpgsqlpostgresql

Coming from a MySQL background, where stored procedure performance (older article) and usability are questionable, I am evaluating PostgreSQL for a new product for my company.

One of the things I would like to do is move some of the application logic into stored procedures, so I'm here asking for DOs and DON'Ts (best practices) on using functions in PostgreSQL (9.0), specifically regarding performance pitfalls.

Best Answer

Strictly speaking, the term "stored procedures" points to SQL procedures in Postgres, introduced with Postgres 11. Related:

When to use stored procedure / user-defined function?

There are also functions, doing almost but not quite the same, and those have been there from the beginning.

Functions with LANGUAGE sql are basically just batch files with plain SQL commands in a function wrapper (and therefore atomic, always run inside a single transaction) accepting parameters. All statements in an SQL function are planned at once, which is subtly different from executing one statement after the other and may affect the order in which locks are taken.

For anything more, the most mature language is PL/pgSQL (LANGUAGE plpgsql). It works well and has been improved with every release over the last decade, but it serves best as glue for SQL commands. It is not meant for heavy computations (other than with SQL commands).

PL/pgSQL functions execute queries like prepared statements. Re-using cached query plans cuts off some planning overhead and makes them a bit faster than equivalent SQL statements, which may be a noticeable effect depending on circumstances. It may also have side effects like in this related question:

PL/pgSQL issues when function used twice (caching problem ?)

This carries the advantages and disadvantages of prepared statements - as discussed in manual. For queries on tables with irregular data distribution and varying parameters dynamic SQL with EXECUTE may perform better when the gain from an optimized execution plan for the given parameter(s) outweighs the cost of re-planning.

Since Postgres 9.2 generic execution plans are still cached for the session but, quoting the manual:

This occurs immediately for prepared statements with no parameters; otherwise it occurs only after five or more executions produce plans whose estimated cost average (including planning overhead) is more expensive than the generic plan cost estimate.

We get best of both worlds most of the time (less some added overhead) without (ab)using EXECUTE. Details in What's new in PostgreSQL 9.2 of the PostgreSQL Wiki.

Postgres 12 introduces the additional server variable plan_cache_mode to force generic or custom plans. For special cases, use with care.

You can win big with server side functions that prevent additional round-trips to the database server from your application. Have the server execute as much as possible at once and only return a well defined result.

Avoid nesting of complex functions, especially table functions (RETURNING SETOF record or TABLE (...)). Functions are black boxes posing as optimization barriers to the query planner. They are optimized separately, not in the context of the outer query, which makes planning simpler, but may result in less than perfect plans. Also, cost and result size of functions cannot be predicted reliably.

The exception to this rule are simple SQL functions (LANGUAGE sql), which can be "inlined" - if some preconditions are met. Read more about how the query planner works in this presentation by Neil Conway (advanced stuff).

In PostgreSQL a function always automatically runs inside a single transaction. All of it succeeds or nothing. If an exception occurs, everything is rolled back. But there is error handling ...

That's also why functions are not exactly "stored procedures" (even though that term is used sometimes, misleadingly). Some commands like VACUUM, CREATE INDEX CONCURRENTLY or CREATE DATABASE cannot run inside a transaction block, so they are not allowed in functions. (Neither in SQL procedures, yet, as of Postgres 11. That might be added later.)

I have written thousands of plpgsql functions over the years.

Best Answer

Related Solutions

Best practices for change management with indexes

Postgresql – Promotion of PostgreSQL standby and application reconnection

Related Question