PostgreSQL – is it possible to use a plpgsql function in a bash script

functionsplpgsqlpostgresqlpostgresql-9.1scripting

Target : What i'm trying to achieve is to restore a partial backup of a database (only certain rows that satisfied certain conditions have been backed up).

Problem : Since between the partial-backup and the restore may pass some time, and certain records could be deleted, I need to perform an Upsert(Update if record exist, Insert if record doesn't exists). This operation in Mysql would have be done with REPLACE, but in PostgreSQL it doesn't exist. After some research I've found a good substitute of it(correct me if i'm wrong) in this function :

CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
        -- first try to update the key
        UPDATE db SET b = data WHERE a = key;
        IF found THEN
        RETURN;
        END IF;
        -- not there, so try to insert the key
        -- if someone else inserts the same key concurrently,
        -- we could get a unique-key failure
        BEGIN
            INSERT INTO db(a,b) VALUES (key, data);
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- Do nothing, and loop to try the UPDATE again.
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

The problem with this is that I need to make dynamic the input parameters of the function, and also the UPDATE and the INSERT statements, in order to adapt it for every record and every table i'm going to upsert.

Question :

  • is it possible to integrate plpgsql functions in bash scripts in order to makes them more dynamic?

Best Answer

As you were hinted in comments above, your RBAR approach might be very inefficient. Consider the suggestions there.

Also, I am not going into details about the different approaches of UPSERT, as it is a very broad topic, especially when one wants to do concurrency-safe. PostgreSQL 9.5 helps a lot in this regard.

So, to your actual question: it is not directly possible writing PL/pgSQL 'commands' interspersed with bash, but the task is definitely solvable.

For this, you need to define a function like this (left out the complexity to show my point more concisely):

CREATE OR REPLACE FUNCTION dynamic_upsert(
    table_to_upsert name, 
    key_column name, 
    value_column name, 
    key integer, 
    value text
)
RETURNS void
LANGUAGE plpgsql
AS $BODY$
BEGIN
    ...
    EXECUTE $$ UPDATE $$ || quote_ident(table_to_upsert) || 
            $$ SET $$ || quote_ident(value_column) || $$ = $2 
               WHERE $$ || quote_ident(key_column) || $$  = $1 $$ 
        USING key, value;
    ...
END;
$BODY$;

Please note that this is fully dynamic in that you can choose any table, match against any column of it and modify any (other) column. The type of the key (integer) and the value (text) are fixed, though.

Now you can call this from a shell script like

psql -h $DBHOST -U $DBUSER -d $DBNAME \
     -c "SELECT dynamic_upsert($TABLE, $KEY_COL, $VAL_COL, $key, $value)"

If you are still RBARing, this will produce an additional overhead of connecting to the database for every single change, so be patient (or set up a connection pool).

Notes:

  • using concatenation, one has to be absolutely sure to quote the object names properly. This is here done using quote_ident(), which is the way to go if you are concatenating the dynamic statement. In newer versions (from 9.1), one would use the format() function to avoid the mess:
EXECUTE format($$ UPDATE %I SET %I = $2 WHERE %I = $1$$, 
               table_to_upsert, value_column, key_column) 
USING key, value;
  • here I am using dollar quoting when building the dynamic query. This allows me to use 'normal' syntax, as opposed to multiplicating single quotes, for example (not present in this example). This way most editors will highlight the statements nicely.