Postgresql – Should I write a scalar function or a setof returning function for an INSERT/UPDATE/DELETE which can return at most one row

functionsplpgsqlpostgresqlset-returning-functions

I want to embed certain INSERT, UPDATE or DELETE statements into a function, to enhance re-usability. These functions have in common that the are manipulating a single row (if it exists).

An example is the following function:

create function my_table_delete(_identifier int) returns setof int as
$BODY$
begin
    return query
    delete from    my_table
    where          identifier = _identifier
    returning      identifier;
end
$BODY$
language plpgsql volatile strict;

This function accepts an identifier, deletes the corresponding row and returns the same identifier in case the delete was successful.

I am quite unsure if I should implement this as a set-returning function (like I did above) or a scalar function.

My reasoning for preferring set-returning function is the following:

  1. In case of a scalar function, we will have a NULL return value in case the row could not be found (while we will have 0 rows in case of the set-returning function). Having to deal with NULLs is a pain.

  2. I'm working with sets (SQL is all about sets), so it makes sense to just keep doing that.

  3. Also, I prefer returning the identifier (1 row: successful, 0 rows: not successful) instead of a boolean (true: successful, false: not successful), as it would allow constructs such as:

with d(id) as
(
    select * from my_table_delete(1234)
)
insert into log_table(id, action)
select      d.id, 'DELETE'
from        d;

If I were to return a boolean, the statement above could become less elegant.

I'm interested whether those arguments make sense. I'm quite new to the embedding logic into the database, so I'm not sure if I am addressing this correctly.

What do you think?

On a slightly related note, I also wonder if I should define the function as following:

create function my_table_delete(_identifier int)
    returns table (identifier int)
as ...

This gives the output row's column a name, which is even more convenient to use.

Best Answer

First, let's assert that manipulating a single row is positively guaranteed by a UNIQUE or PRIMARY KEY constraint on my_table.identifier. Else it should be a table function to begin with.

Next: Should you write a function for such a simple DELETE at all? Or use the plain DELETE instead? Barely longer and less overhead. (Maybe your actual function is more sophisticated ...)

Then, while your function is that simple, a plain SQL function might serve:

CREATE FUNCTION my_table_delete(_identifier int)
  RETURNS TABLE(identifier int)
$func$
   DELETE FROM my_table t
   WHERE  t.identifier = _identifier
   RETURNING t.identifier;
$func$  LANGUAGE sql VOLATILE STRICT;

Also using RETURNS TABLE like you pondered: Note the table-qualified column name to avoid conflicts. All function parameters (including columns of a RETURNS TABLE clause) are visible in SQL commands in the function body (in PL/pgSQL as well as in SQL functions). You need to avoid conflicts between column names and parameters.

An SQL function has less overhead. But if you call the function many times in the same session, the PL/pgSQL variant may actually be faster since it manages SQL statements like prepared statements, meaning the query plan is saved and reused if some preconditions are met. Then again: much of the same effect is achieved with prepared statements. See:

As to your reasoning:

1. If you don't want to deal with NULL, then the case is settled. Make it a table function. But you may have to deal with NULL after all: Various query constructs (have to) convert "no row" to NULL.

2. sounds like an argument, but is just a logical fallacy. Does not carry weight.

3. If it's your personal preference, then the case is settled again. Can't argue with that. But a data-modifying CTE would work with the scalar function just as well:

WITH d(id) AS (
   SELECT my_table_delete(1234)
   )
INSERT INTO log_table(id, action)
SELECT  d.id, 'DELETE'
FROM    d
WHERE   d.id IS NOT NULL;

In summary: go with RETURNS TABLE, but it hardly matters for the given case.