PostgreSQL – How to Recursively Update a String

postgresqlrecursive

I have a table with a column containing names like this:

id, employee
1, Mr. John Cole Thornton
2, Mr. Paul George Mckenzie
3, Mr. George Mick McDoughal
4, Ms. Emily Suzan Flemming
5, Mr. Alan Bourdillion Traherne

I have a second table with a list of first names, like this

id, first_name
1, Emily
2, John
3, George
4, Suzan
5, Paul
6, Alan
7, Mary
8, Mick
9, Bourdillion
10, Jim
11, Cole

And I want to remove the first names in the first table, in order to obtain this:

id, employee
1, Mr. Thornton
2, Mr. Mckenzie
3, Mr. McDoughal
4, Ms. Flemming
5, Mr. Traherne

No matter how many first names, I would like to remove them all without calling several times my first_names table, and I wonder if this would be possible without using a loop in a function.

I have tried a query like this :

WITH RECURSIVE name AS (   SELECT REPLACE(t1.employee, t2.first_name, '') sec_name 
                        FROM  t1, t2 
                        WHERE position(t2.first_name in t1.employee) > 0 ) 
SELECT sec_name FROM name ;

But I get as many output as there are first names in the input, like :

Id, sec_name
1, John Thornton
1, Cole Thornton
2, Paul Mckenzie
2, George Mckenzie
...

My Postgres version is 9.6 .

Any help will be much appreciated !

Best Answer

WITH RECURSIVE cte AS (
    SELECT employee, 1 id
    FROM t1
    UNION ALL
    SELECT REPLACE(employee, first_name, ''), id+1
    FROM cte
    JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte 
WHERE id > ( SELECT MAX(id) 
             FROM t2 )

fiddle

If t2.id values do not start with 1 or have omissions, you must re-enumerate t2 records in CTE using ROW_NUMBER().

UPDATE.

Possible problem - some first_name is a substring of some employee (John and Johnson, for example). If so use spaces as additional wrappers:

WITH RECURSIVE cte AS (
SELECT employee || ' ' employee, 1 id
FROM t1
UNION ALL
SELECT REPLACE(employee, ' ' || first_name || ' ', ' '), id+1
FROM cte
JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte 
WHERE id > ( SELECT MAX(id) 
             FROM t2 )

fiddle

PS. Trim excess trailing space if needed.

Related Solutions

SQL Server – Optimally Find the End of a Relationship Chain

I don't have your dataset, so can't test if this is better, but it feels better. After building the chains, we reverse them to find the 'childest' item in each chain, then join back to the original chain.

-- The same as your first CTE
;WITH RelationshipChain AS (
SELECT ID, ParentID, ChildID, 0 AS Seq, ID AS RootID
FROM Relationships WHERE ParentID = ''
UNION ALL
SELECT r2.ID, r2.ParentID, r2.ChildID, rc.Seq + 1 AS Seq, rc.RootID AS RootID
FROM Relationships r2 
INNER JOIN RelationshipChain rc ON rc.ChildID = r2.ID 
)

-- This CTE is new and reverses the sense of the chains
, BackChain AS (
SELECT ID, RootID, ROW_NUMBER() OVER (PARTITION BY RootID ORDER BY Seq DESC) BackSeq
FROM RelationshipChain
)

-- Now we join each chain to the childest item in the reverse chain
SELECT 
    rc.ID
    , rc.ParentID
    , rc.ChildID
    , rc.Seq
    , rc.RootID
    , bc.ID ChainEndID
FROM RelationshipChain rc INNER JOIN BackChain bc ON RC.RootID = bc.RootID AND bc.BackSeq = 1
ORDER BY RootID, Seq

Results with your sample data:

ID   ParentID ChildID Seq         RootID ChainEndID
---- -------- ------- ----------- ------ ----------
0001          0003    0           0001   0007
0003 0001     0005    1           0001   0007
0005 0003     0006    2           0001   0007
0006 0005     0007    3           0001   0007
0007 0006             4           0001   0007
0002          0004    0           0002   0004
0004 0002             1           0002   0004

Indexes will help performance.

Mysql – Hierarchical Storage and Query

Since the depth is variable and an item can have multiple parents, I recommend a general "relationship", not a parent-child hierarchy.

See this for the optimal way to write a many-to-many table.

Then you need to write multiple queries, not a single, fixed-depth, query. There is no "good" way to traverse a tree/DAG/web (until 8.0, with CTEs). And the speed of multiple queries, with suitable indexes and suitable caching, should be quite adequate.

Best Answer

Related Solutions

SQL Server – Optimally Find the End of a Relationship Chain

Mysql – Hierarchical Storage and Query

Related Question