Concatenate String Replacements in Amazon RDS – PostgreSQL Best Practices

amazon-rdspostgresql

I have a concatenation of string replacements in this way:

CREATE OR REPLACE FUNCTION norm(t text) RETURNS text AS $$
        declare t1 text;
        declare t2 text;
        ...
        declare t10 text;
        BEGIN
            select replace (lower(t), ',', ' ') into t1;
            select regexp_replace (t1, '[^0-9a-z\s-]', '', 'g') into t2;
            ...
            select regexp_replace (t10, 'n?oise?.*', '', 'g') into t10;
            RETURN trim(t10);
        END;
$$ LANGUAGE plpgsql;

At the moment it has 10 different operations and the number keeps on grow. Also it has the inconvenient that changing position to transformations forces to rename variables.

Is there a better way to manage this situation?

Best Answer

You can help yourself by having a replacements table:

CREATE TABLE replacements
(
    execution_order INTEGER NOT NULL PRIMARY KEY,
    search_regexp TEXT NOT NULL CHECK (search_regexp > ''),
    replace_by TEXT NOT NULL,
    flags TEXT NOT NULL DEFAULT 'g'
) ;

That you would populate with as many replacements you need to perform:

INSERT INTO replacements
    (execution_order, search_regexp, replace_by, flags)
VALUES
    (100, '[^0-9a-z\s-]', '', 'g'),
    (200, 'n?oise?.*', '', 'g') ;

And then change your function to use it:

CREATE OR REPLACE FUNCTION norm(t_in text) 
RETURNS text AS 

$body$
DECLARE
    t_out text ;
    rep record ;
BEGIN
    t_out := replace(lower(t_in), ',', ' ') ;
    FOR rep IN
        SELECT search_regexp, replace_by, flags
        FROM replacements
        ORDER BY execution_order
    LOOP
        t_out := regexp_replace(t_out, rep.search_regexp, rep.replace_by, rep.flags) ;
    END LOOP ;
    RETURN t_out ;
END ;
$body$ 

LANGUAGE plpgsql IMMUTABLE STRICT ;

And check the results:

SELECT
    norm('You don''t like diacriticals àèìòùáéíóú') AS n1,
    norm('You do like noiseless numbers') AS n2,
    norm('And whatever you like to try also, removing commas, , , ') AS n3;
n1                          | n2           | n3                                                      
:-------------------------- | :----------- | :-------------------------------------------------------
you dont like diacriticals  | you do like  | and whatever you like to try also  removing commas      

This won't be as fast as hard-coding all the changes in your function, but will allow for highest flexibility.


As an alternative, you can just change the code structure of your function, and reuse the same variable more than once1:

CREATE OR REPLACE FUNCTION norm(t_in text) 
RETURNS text AS 

$body$
DECLARE
    t_out text ;
BEGIN
    t_out := replace(lower(t_in), ',', ' ') ;
    t_out := regexp_replace(t_out, '[^0-9a-z\s-]', '', 'g') ;
    t_out := regexp_replace(t_out, 'n?oise?.*', '', 'g') ;
    -- Keep adding substitutions, or inserting them where appropriate
    RETURN t_out ;
END ;
$body$ 

LANGUAGE plpgsql IMMUTABLE STRICT ;

This is faster, and the best alternative if the number of substitutions is moderate.

You can check it at dbfiddle here


1) PLPGSQL is not a functional language that forces you to assign only once to vals, variables can be overwritten as many times as necessary. If you think in Scala terms, they're vars, not vals. In Java terms, they aren't immutable.