PostgreSQL – Return Default Value if Regex Match Fails

functionspattern matchingpostgresqlregexset-returning-functions

I'd like to attempt a regex match, and return null if it fails.

The following query attempts to find the first number in a string. The result ignores the entry with text 'blah'. I'd prefer it to return a null value instead.

This question is probably less about regex and more about set algebra. My hunch is that there is an elegant way to do it, without needing to left join anything, though googling is proving futile.

with test_data as (
  select 'abc 123' as txt
  union
  select 'abc 456' as txt
  union
  select 'blah' as txt
)

select
  txt,
  (regexp_matches(txt, '\d+'))[1] as first_num
from
  test_data

Best Answer

While your answer solves the problem with regexp_matches(), the much better solution is to avoid the problem with the more appropriate function substring() to begin with:

WITH test_data(txt) AS (
   VALUES
      (text 'abc 123')
    , ('abc 456')
    , ('blah')
   )
SELECT txt, substring(txt FROM '\d+') AS first_num
FROM   test_data;

Does exactly what you need: returns the first match in the string or NULL for no match.

substring(string FROM pattern) is standard SQL syntax.
Don't confuse it with substring(string [FROM int] [FOR int]) or substring(string FROM pattern FOR escape) doing something else.

You can also use the shorter, equivalent, currently undocumented Postgres implementation substring(text, text) (with comma (,) instead of FROM).

Use regexp_matches() to return all matches as set. (Note that a single regexp can result in multiple matches, hence a set of text arrays). You still only get the first row unless you add the flag 'g' to get all rows (additional matches - if any - for the same regexp). Related:

To avoid complications move set-returning functions to the FROM list in versions before Postgres 10:

Or use regexp_match() in Postgres 10 or later to only return the first row. This would also not exhibit the original problem:

WITH test_data(txt) AS (
   VALUES
      (text 'abc 123')
    , ('abc 456')
    , ('blah')
   )
SELECT txt, (regexp_match(txt, '\d+'))[1] AS first_num
from   test_data;

But just use substring() for your purpose.

db<>fiddle here