Faulty oracle11g regexp_replace on removing duplicates

oracleregexregular expression

I have a comma separated list of values (as one single string), which contains duplicates, and I want to remove them. I tried with REGEXP_REPLACE but for certain values it gives me a wrong result:

/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
       regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001',
                  '([^,]+)(,*\1)+',
                  '\1') as str_uq
from dual
union all 
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
       regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701',
                  '([^,]+)(,*\1)+',
                  '\1') as str_uq
from dual

Running the above example, will provide the expected result for the first query, but for the second one, it just concatenates all the values without removing duplicates. I guess the fact that the values are starting with 1 has something to do with it.

Can you please help me out with the regex pattern? I cannot find the mistake…

Thank you!

Best Answer

In your 2nd example the values are unique, but you're searching for any repeated match and there's multiple matches:

'106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701'

It's not concatenating, it's removing some characters :-)

Usually adding a word boundary \b should work, but not in Oracle.

I found this is working '(.+?,)\1+', but you need to add a final comma to the string (fiddle)

/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
       regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' || ',',
                  --'(\b[^,]+)(,*\1)+', --should work, doesn't in Oracle
                  '(.+?,)\1+',
                  '\1') as str_uq
from dual
union all 
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
       regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' || ',',
                  --'(\b[^,]+)(,*\1)+', --should work, doesn't in Oracle
                  '(.+?,)\1+',
                  '\1') as str_uq
from dual