Postgresql – Replace part of strings in multiple columns and tables

postgresqlpostgresql-9.5

I have several fields in db containing url inside different strings, part of which I need to replace with another one.
F.e. the string is

abc::http://localhost:5000/pic/img/1.jpg|def:::blahblahblah_http://localhost:5000/hij/klm/hxyhgbcbxjgxz.jpeg:jgfgv/hgh/

I have many different strings like that.

I need to replace http://localhost:5000 with https://example.com but leave everything else untouched.

Also I need to do that without stopping the database / on a working database.

I was thinking about doing smth like that:

UPDATE table01 
  SET field01 = replace(field01, 'http://localhost:5000', 'https://example.com')

How can I make this query include all strings in db, not only in table01 & field01, but all tables & all fields? Just want to make sure that there are no references left to this url at all.

Best Answer

First of all, your UPDATE without WHERE clause is needlessly (very!) expensive. It would write a new row version for all rows, even where nothing changes. Only update rows that actually need an update!

UPDATE table01 
SET    field01 = replace(field01, 'http://localhost:5000', 'https://example.com')
WHERE  field01 ~ 'http://localhost:5000';

How do I (or can I) SELECT DISTINCT on multiple columns?

Next, careful what you replace. You'll want to avoid false positives. Test before you apply it to the whole database. Your current expression:

replace(field01, 'http://localhost:5000', 'https://example.com')

would also change strings that probably shouldn't be changed. Like 'http://localhost:50000/pic/img/1.jpg' (note: 50000 instead of 5000)

To avoid that particular case:

regexp_replace(field01, 'http://localhost:5000\M', 'https://example.com', 'g')

Using regular expressions instead. \M at the end of the string, per documentation,

matches only at the end of a word

You may want to do more, depending on your exact requirements. So:

UPDATE table01 
SET    field01 =  regexp_replace(field01, 'http://localhost:5000\M'
                                        , 'https://example.com', 'g')
WHERE  field01 ~ 'http://localhost:5000\M';

Then apply to all string columns in a table. Ideally, only update each row once, to make this as cheap as possible. This plpgsql function executes the update for one given table:

CREATE OR REPLACE FUNCTION f_replace_everywhere(_pattern text, _new_string text, _tbl regclass, OUT updated_rows int) AS
$func$
DECLARE
   -- basic string types, possibly extend with citext, domains or custom types:
   _typ  CONSTANT regtype[] := '{text, bpchar, varchar}';
   _sql  text;
BEGIN
   SELECT INTO _sql     -- build command
          format('UPDATE %s SET %s WHERE %s'
               , _tbl
               , string_agg(format($$%1$s = regexp_replace(%1$s, $1, $2, 'g')$$, col), ', ')
               , string_agg(col || ' ~ $1', ' OR '))
   FROM  (
      SELECT quote_ident(attname) AS col
      FROM   pg_attribute
      WHERE  attrelid = _tbl            -- valid, visible, legal table name 
      AND    attnum >= 1                -- exclude tableoid & friends
      AND    NOT attisdropped           -- exclude dropped columns
      AND    NOT attnotnull             -- exclude columns defined NOT NULL!
      AND    atttypid = ANY(_typ)       -- only character types
      ORDER  BY attnum
      ) sub;

   -- Test
   -- RAISE NOTICE '%', _sql;

   -- Execute
   IF _sql IS NULL THEN
      updated_rows := 0;                         -- nothing to update
   ELSE
      EXECUTE _sql
      USING _pattern, _new_string;

      GET DIAGNOSTICS updated_rows = ROW_COUNT;  -- Report number of affected rows
   END IF;
END
$func$  LANGUAGE plpgsql;

Call to apply to a single table:

SELECT f_replace_everywhere( 'http://localhost:5000\M'
                           , 'https://example.com'
                           , 'my_table');

Closely related with detailed explanation:

Replace empty strings with null values

Finally, apply to all relevant tables in our database. Don't touch system tables. You can loop through all tables based on pg_tables or information_schema.tables. See:

How to check if a table exists in a given schema

I'll leave that last step to you.

Related Solutions

Postgresql – String replace using concatenated strings from various columns

Depending on how you would want to deal with possible NULL values, concat_ws() is probably your safest and simplest way to go:

UPDATE tbl
SET filed1 = replace(field1, concat_ws(' ', field2, field3, field4), '')
WHERE filed1 IS DISTINCT FROM replace(field1, concat_ws(' ', field2, field3, field4), '')

concat_ws() ignores NULL values. With plain concatenation (||), one NULL field would make the whole pattern NULL. It was introduced with Postgres 9.1.

The added WHERE clause prevents empty updates. This enhances performance a lot if many rows wouldn't change anyway.

Postgresql – Grouping on any one of multiple columns in Postgres

Another recursive solution that:

first creates the adjacency list of the connected graph of ids,
then finds its transitive closure (this is the recursive part)
and then groups by (once) to find the connected components that each node belongs to
and joins to the table again and group by (again) to gather the values from all nodes of each connected component.

Initial data (copied from Jack Douglas' solution):

begin;
create schema stack;
set search_path=stack;

create table foo as
select *
from (values (1,'X','D','G','P')
           , (2,'F','D','L','M')
           , (3,'X','N','R','S')
           , (4,'Y','I','W',null)
           , (5,'U','Z','E',null) ) AS f(id,a,b,c,d);

The query:

with recursive 
  al (tail, head) as                     -- adjacency list 
  ( select f.id, g.id 
    from foo as f join foo as g
      on (f.a = g.a or f.b = g.b or f.c = g.c or f.d = g.d) 
  ),
  tc (tail, head) as                     -- transitive closure
  ( select * from al
    union distinct
    select f.tail, g.head 
    from al as f join tc as g on f.head = g.tail
  ) ,
  cc (head, ids) as                      -- group once
  ( select head, array_agg(distinct tail order by tail) as ids
    from tc
    group by head
  ) 
select                                   -- group twice
    ids,
    array_agg(distinct a order by a) as a,
    array_agg(distinct b order by b) as b,
    array_agg(distinct c order by c) as c,
    array_agg(distinct d order by d) as d
from
  cc join foo on cc.head = foo.id
group by ids ;

┌─────────┬───────┬───────┬─────────┬─────────┐
│   ids   │   a   │   b   │    c    │    d    │
├─────────┼───────┼───────┼─────────┼─────────┤
│ {1,2,3} │ {F,X} │ {D,N} │ {G,L,R} │ {M,P,S} │
│ {4}     │ {Y}   │ {I}   │ {W}     │ {NULL}  │
│ {5}     │ {U}   │ {Z}   │ {E}     │ {NULL}  │
└─────────┴───────┴───────┴─────────┴─────────┘

Cleanup:

rollback;

Best Answer

Related Solutions

Postgresql – String replace using concatenated strings from various columns

Postgresql – Grouping on any one of multiple columns in Postgres

Related Question