PostgreSQL UDF (User Defined Functions) overhead

functionsperformanceplpgsqlpostgresqlpostgresql-performance

Disclaimer

The task may seem esoteric, but nevertheless I want to create a POC of some sorts.

The goal

My goal is to make PostgreSQL database (version 10) expose an API to an application that uses it.

An API needs to be in form of a set of UDFs: all functions belong to a public scheme which is the only one accessible to an application. The tables and other stuff are hidden in private scheme. Almost like, you know, an object-oriented database.
Here's why I'm trying to make it work:

  • It would decouple database from application, so you can restructure/optimize/denormalize the former with lesser risk of breaking the latter. You can even delegate its maintenance to another team or department (oh my)
  • An API formalizes the requirements for a service. A database surely is a service, but the traditional mechanics called migrations doesn't serve well in figuring out what's going on in there. Think of hundreds or thousands of migrations that collected over the years, and some of them are broken and never will work again, and

Well, nevermind.

The issue

So when I've tried to create some very simple functions (like getting all records from a table), I mentioned that they are always slower than the query it wraps. While this is totally acceptable and understandable by itself, the timing difference can be huge. Hence, unacceptable.

The example

I've got a table like this.

CREATE TABLE notifications (
    id SERIAL PRIMARY KEY,
    source_type INTEGER NOT NULL,
    content JSONB,
    created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP(3)
)

And >120k records in it.
Imagine we want to get all of them.
Here we do it with a simple query. No indexes, JSONB data is almost 1kb for every record.

EXPLAIN (ANALYZE,VERBOSE,BUFFERS) SELECT * FROM private.notifications;
                                                         QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on private.notifications  (cost=0.00..16216.13 rows=120113 width=877) (actual time=0.015..496.473 rows=120113 loops=1)
  Output: id, source_type, content, created
  Buffers: shared hit=15015
Planning time: 0.063 ms
Execution time: 973.935 ms

496ms.
Now let's try to utilize a pl/pgsql function like this one:

CREATE OR REPLACE FUNCTION notifications_get()
RETURNS SETOF private.notifications AS
$$
BEGIN
    RETURN QUERY SELECT * from private.notifications;
END
$$
LANGUAGE 'plpgsql' 
SECURITY DEFINER;

EXPLAIN (ANALYZE,VERBOSE,BUFFERS) SELECT * FROM notifications_get();

                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
Function Scan on notifications_get  (cost=0.25..10.25 rows=1000 width=48) (actual time=99.561..589.129 rows=120113 loops=1)
  Output: id, source_type, content, created
  Function Call: notifications_get()
  Buffers: shared hit=15015
Planning time: 0.045 ms
Execution time: 1091.698 ms

589ms.
Obviously, the difference between function and a query is these 99.5ms spent getting first record.
I've tried further optimizations (maybe naively):

  1. Adjust rows to make query planning more realistic. Let's say 120k. It yields the same results (102.373..593.628)
  2. Use the SQL language (fair enough, the query is simple). Surprisingly, same results (95.760..595.746)
  3. Make function STABLE. Should get better now, right? Nope. Same results (93.132..594.331)

Questions

  1. Is there anything else that can be done to make function more performant (compared to a simple query)?
  2. Why none of these tricks made a difference?
  3. What exactly are these first 100ms? These aren't constant: when there were 20k rows in the table, the function would spend mysterious 18-20ms trying to do something first. So obviously it tries to do it with every single row it the table. How to reduce this wasting or get rid of it at all? Is this even possible?

PS

Another issue I faced was with the function that gets a record by its id. 0.25ms vs 0.025ms. Ten-fold difference, but I more or less get where it comes from. Again, no optimization trick listed above made a difference (seems like it shouldn't).

Best Answer

This is (almost) equivalent to your function in the question, but performs like a plain SELECT:

CREATE OR REPLACE FUNCTION notifications_get_faster()
  RETURNS SETOF private.notifications AS
$func$
SELECT * FROM private.notifications
$func$  LANGUAGE sql STABLE;

Almost, because it's not SECURITY DEFINER, which would prevent the desired effect.

Most notably, you will see a Seq Scan instead of the Function Scan in the query plan. That's what makes most of the difference.

Why?

Neither of your various attempts met all conditions for inlining of table functions. This function does. In particular:

  • the function is LANGUAGE SQL

  • the function is not SECURITY DEFINER

  • the function is declared STABLE or IMMUTABLE

So Postgres can take the functions body and execute it without function overhead ("function inlining"). Only adds very minor extra planning cost as compared to a plain SELECT.

Aside: don't quote the language name. It's an identifier.