Db2 – Emulate REGEXP like behaviour in DB2 SQL

db2regular expression

I posted the same on stackoverflow (please, let me know if I have to delete one).

I'm working on a DB2 database and as far as I can see regexp is not supported (without additional libraries).

So I cannot implement something similar to what is explained in this article "Bringing the Power of Regular Expression Matching to SQL"

Do you know if I can "emulate", with a SQL statement, a regular expression like this?

^a[aofdmep]{1}[a-z]{1}[a-z0-9]{4}[sidbfkfpo]{1}

Edit

In the above hypothesis I found that it's acceptable for my case a WHERE Like predicate:

WHERE USER_NAME NOT LIKE 'a_______'

But it's a unsafe and doesn't cover other cases where I don't have a fixed char that I can match.

Best Answer

First off, the {1}s are redundant, so it's really just:

^[aofdmep][a-z][a-z0-9]{4}a[sidbfkfpo]

This is actually a pretty simple patterh ... it's just looking at what the first 8 characters of the string are, and it's always a fixed length string, so you could possibly build a table of all of the permutations, and then do:

WHERE SUBSTR(string_to_match, 1, 8) IN (
  SELECT valid_prefixes FROM table_of_things_to_match
)

Unfortunately, it's 7*26*36^4*9 ... 2.75 billion possible combinations, but, still, fixed strings, so you could do:

WHERE SUBSTR(string_to_match, 1, 1) IN ('a','o','f','d','m','e','p')
  AND SUBSTR(string_to_match, 2, 1) IN ('a','b','c','d' ... 'z')
  AND SUBSTR(string_to_match, 3, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 4, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 5, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 6, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 7, 1) = 'a'
  AND SUBSTR(string_to_match, 8, 1) IN ('s','i','d','b','f','k','p','o')

(fill in the ... bits, of course)

oops, there's two fs in that last character class, so there's only 2.45 billion permutations.

I'm not going to pretend that it's going to be fast ... it likely won't be, but it'll get you the pattern you're looking for. If you tend to do this a lot, I'd likely build a table of characters so you had an easy way to select out alpha / numeric or alpha-numeric.