PostgreSQL procedural languages overhead (plpython / plsql / pllua…)

plpgsqlplpythonpostgresql

I'm trying to find information about PostgreSQL user defined functions in procedural languages performance for real time tasks.

  1. How does they compare to builtin functions?
  2. Is there any difference (in overhead) how Postgres call / manage plpython vs plpgsql vs pllua functions (I'm interested in the Postgres integration / context / data transfer side, not the VM itself)?
  3. Is the context a big overhead? Can I use it for realtime data mapping (let's say 1000 queries/s))
  4. Is there any benefit of writing user defined functions in plpgsql then other pg/language? On the documentation they enumerate advantages, but I think they apply to all postgresql procedural languages.

Related findings:

Best Answer

  1. UDFs in interpreted languages are pretty much always slower than UDFs written in C or built-in functions, all other things being the same.

  2. Each language binding has different code to connect PostgreSQL to the language, with different degrees of optimisation, different ways of passing some data types, etc. So variation certainly exists. It shouldn't be huge unless you're passing a data type that gets very different handling by one language than another, e.g. one passes a hstore as a string, and another converts it to a dict.

  3. Unclear what "the context" is. Can you use it for "real time data mapping" ... well, depends on what the function does and if it's fast enough on the server it's running on, for the clients it's taking to, and for your requirements. How long is a piece of string? Benchmark.

  4. PL/PgSQL is simpler to write, and offers faster access to SQL. It's generally better when you need to wrap a little logic around a lot of SQL. It's very slow for mathematical operations and complex algorithms, so purely computational code in PL/PgSQL should be avoided whenever possible in favour of C, or a faster procedural language.

Speedups when re-implementing PL/PgSQL code in C can vary from neglible to over 1000 times. It all depends on what the code is actually doing.

(This kind of multi-question isn't well suited to Stack Exchange as it's harder to have a definitive answer)