I have a comma separated list of values (as one single string), which contains duplicates, and I want to remove them. I tried with REGEXP_REPLACE
but for certain values it gives me a wrong result:
/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001',
'([^,]+)(,*\1)+',
'\1') as str_uq
from dual
union all
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701',
'([^,]+)(,*\1)+',
'\1') as str_uq
from dual
Running the above example, will provide the expected result for the first query, but for the second one, it just concatenates all the values without removing duplicates. I guess the fact that the values are starting with 1
has something to do with it.
Can you please help me out with the regex pattern? I cannot find the mistake…
Thank you!
Best Answer
In your 2nd example the values are unique, but you're searching for any repeated match and there's multiple matches:
'106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701'
It's not concatenating, it's removing some characters :-)
Usually adding a word boundary
\b
should work, but not in Oracle.I found this is working
'(.+?,)\1+'
, but you need to add a final comma to the string (fiddle)