Postgresql – Replace part of strings in multiple columns and tables

postgresqlpostgresql-9.5

I have several fields in db containing url inside different strings, part of which I need to replace with another one.
F.e. the string is

abc::http://localhost:5000/pic/img/1.jpg|def:::blahblahblah_http://localhost:5000/hij/klm/hxyhgbcbxjgxz.jpeg:jgfgv/hgh/

I have many different strings like that.

I need to replace http://localhost:5000 with https://example.com but leave everything else untouched.

Also I need to do that without stopping the database / on a working database.

I was thinking about doing smth like that:

UPDATE table01 
  SET field01 = replace(field01, 'http://localhost:5000', 'https://example.com')

How can I make this query include all strings in db, not only in table01 & field01, but all tables & all fields? Just want to make sure that there are no references left to this url at all.

Best Answer

First of all, your UPDATE without WHERE clause is needlessly (very!) expensive. It would write a new row version for all rows, even where nothing changes. Only update rows that actually need an update!

UPDATE table01 
SET    field01 = replace(field01, 'http://localhost:5000', 'https://example.com')
WHERE  field01 ~ 'http://localhost:5000';

Related:

Next, careful what you replace. You'll want to avoid false positives. Test before you apply it to the whole database. Your current expression:

replace(field01, 'http://localhost:5000', 'https://example.com')

would also change strings that probably shouldn't be changed. Like 'http://localhost:50000/pic/img/1.jpg' (note: 50000 instead of 5000)

To avoid that particular case:

regexp_replace(field01, 'http://localhost:5000\M', 'https://example.com', 'g')

Using regular expressions instead. \M at the end of the string, per documentation,

matches only at the end of a word

You may want to do more, depending on your exact requirements. So:

UPDATE table01 
SET    field01 =  regexp_replace(field01, 'http://localhost:5000\M'
                                        , 'https://example.com', 'g')
WHERE  field01 ~ 'http://localhost:5000\M';

Then apply to all string columns in a table. Ideally, only update each row once, to make this as cheap as possible. This plpgsql function executes the update for one given table:

CREATE OR REPLACE FUNCTION f_replace_everywhere(_pattern text, _new_string text, _tbl regclass, OUT updated_rows int) AS
$func$
DECLARE
   -- basic string types, possibly extend with citext, domains or custom types:
   _typ  CONSTANT regtype[] := '{text, bpchar, varchar}';
   _sql  text;
BEGIN
   SELECT INTO _sql     -- build command
          format('UPDATE %s SET %s WHERE %s'
               , _tbl
               , string_agg(format($$%1$s = regexp_replace(%1$s, $1, $2, 'g')$$, col), ', ')
               , string_agg(col || ' ~ $1', ' OR '))
   FROM  (
      SELECT quote_ident(attname) AS col
      FROM   pg_attribute
      WHERE  attrelid = _tbl            -- valid, visible, legal table name 
      AND    attnum >= 1                -- exclude tableoid & friends
      AND    NOT attisdropped           -- exclude dropped columns
      AND    NOT attnotnull             -- exclude columns defined NOT NULL!
      AND    atttypid = ANY(_typ)       -- only character types
      ORDER  BY attnum
      ) sub;

   -- Test
   -- RAISE NOTICE '%', _sql;

   -- Execute
   IF _sql IS NULL THEN
      updated_rows := 0;                         -- nothing to update
   ELSE
      EXECUTE _sql
      USING _pattern, _new_string;

      GET DIAGNOSTICS updated_rows = ROW_COUNT;  -- Report number of affected rows
   END IF;
END
$func$  LANGUAGE plpgsql;

Call to apply to a single table:

SELECT f_replace_everywhere( 'http://localhost:5000\M'
                           , 'https://example.com'
                           , 'my_table');

Closely related with detailed explanation:

Finally, apply to all relevant tables in our database. Don't touch system tables. You can loop through all tables based on pg_tables or information_schema.tables. See:

I'll leave that last step to you.