PostgreSQL – How to Recursively Update a String

postgresqlrecursive

I have a table with a column containing names like this:

id, employee
1, Mr. John Cole Thornton
2, Mr. Paul George Mckenzie
3, Mr. George Mick McDoughal
4, Ms. Emily Suzan Flemming
5, Mr. Alan Bourdillion Traherne

I have a second table with a list of first names, like this

id, first_name
1, Emily
2, John
3, George
4, Suzan
5, Paul
6, Alan
7, Mary
8, Mick
9, Bourdillion
10, Jim
11, Cole

And I want to remove the first names in the first table, in order to obtain this:

id, employee
1, Mr. Thornton
2, Mr. Mckenzie
3, Mr. McDoughal
4, Ms. Flemming
5, Mr. Traherne

No matter how many first names, I would like to remove them all without calling several times my first_names table, and I wonder if this would be possible without using a loop in a function.

I have tried a query like this :

WITH RECURSIVE name AS (   SELECT REPLACE(t1.employee, t2.first_name, '') sec_name 
                        FROM  t1, t2 
                        WHERE position(t2.first_name in t1.employee) > 0 ) 
SELECT sec_name FROM name ;

But I get as many output as there are first names in the input, like :

Id, sec_name
1, John Thornton
1, Cole Thornton
2, Paul Mckenzie
2, George Mckenzie
...

My Postgres version is 9.6 .

Any help will be much appreciated !

Best Answer

WITH RECURSIVE cte AS (
    SELECT employee, 1 id
    FROM t1
    UNION ALL
    SELECT REPLACE(employee, first_name, ''), id+1
    FROM cte
    JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte 
WHERE id > ( SELECT MAX(id) 
             FROM t2 )

fiddle

If t2.id values do not start with 1 or have omissions, you must re-enumerate t2 records in CTE using ROW_NUMBER().


UPDATE.

Possible problem - some first_name is a substring of some employee (John and Johnson, for example). If so use spaces as additional wrappers:

WITH RECURSIVE cte AS (
SELECT employee || ' ' employee, 1 id
FROM t1
UNION ALL
SELECT REPLACE(employee, ' ' || first_name || ' ', ' '), id+1
FROM cte
JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte 
WHERE id > ( SELECT MAX(id) 
             FROM t2 )

fiddle

PS. Trim excess trailing space if needed.