Depending on how you would want to deal with possible NULL values, concat_ws()
is probably your safest and simplest way to go:
UPDATE tbl
SET filed1 = replace(field1, concat_ws(' ', field2, field3, field4), '')
WHERE filed1 IS DISTINCT FROM replace(field1, concat_ws(' ', field2, field3, field4), '')
concat_ws()
ignores NULL
values. With plain concatenation (||
), one NULL
field would make the whole pattern NULL
. It was introduced with Postgres 9.1.
The added WHERE
clause prevents empty updates. This enhances performance a lot if many rows wouldn't change anyway.
Another recursive solution that:
- first creates the adjacency list of the connected graph of ids,
- then finds its transitive closure (this is the recursive part)
- and then groups by (once) to find the connected components that each node belongs to
- and joins to the table again and group by (again) to gather the values from all nodes of each connected component.
Initial data (copied from Jack Douglas' solution):
begin;
create schema stack;
set search_path=stack;
create table foo as
select *
from (values (1,'X','D','G','P')
, (2,'F','D','L','M')
, (3,'X','N','R','S')
, (4,'Y','I','W',null)
, (5,'U','Z','E',null) ) AS f(id,a,b,c,d);
The query:
with recursive
al (tail, head) as -- adjacency list
( select f.id, g.id
from foo as f join foo as g
on (f.a = g.a or f.b = g.b or f.c = g.c or f.d = g.d)
),
tc (tail, head) as -- transitive closure
( select * from al
union distinct
select f.tail, g.head
from al as f join tc as g on f.head = g.tail
) ,
cc (head, ids) as -- group once
( select head, array_agg(distinct tail order by tail) as ids
from tc
group by head
)
select -- group twice
ids,
array_agg(distinct a order by a) as a,
array_agg(distinct b order by b) as b,
array_agg(distinct c order by c) as c,
array_agg(distinct d order by d) as d
from
cc join foo on cc.head = foo.id
group by ids ;
┌─────────┬───────┬───────┬─────────┬─────────┐
│ ids │ a │ b │ c │ d │
├─────────┼───────┼───────┼─────────┼─────────┤
│ {1,2,3} │ {F,X} │ {D,N} │ {G,L,R} │ {M,P,S} │
│ {4} │ {Y} │ {I} │ {W} │ {NULL} │
│ {5} │ {U} │ {Z} │ {E} │ {NULL} │
└─────────┴───────┴───────┴─────────┴─────────┘
Cleanup:
rollback;
Best Answer
First of all, your
UPDATE
withoutWHERE
clause is needlessly (very!) expensive. It would write a new row version for all rows, even where nothing changes. Only update rows that actually need an update!Related:
Next, careful what you replace. You'll want to avoid false positives. Test before you apply it to the whole database. Your current expression:
would also change strings that probably shouldn't be changed. Like
'http://localhost:50000/pic/img/1.jpg'
(note:50000
instead of5000
)To avoid that particular case:
Using regular expressions instead.
\M
at the end of the string, per documentation,You may want to do more, depending on your exact requirements. So:
Then apply to all string columns in a table. Ideally, only update each row once, to make this as cheap as possible. This plpgsql function executes the update for one given table:
Call to apply to a single table:
Closely related with detailed explanation:
Finally, apply to all relevant tables in our database. Don't touch system tables. You can loop through all tables based on
pg_tables
orinformation_schema.tables
. See:I'll leave that last step to you.