I have different phone formats stored in my db, such as :
727.170.4799
1-416-958-6390
1.561.374.4268
(813) 929-5892
There are approximately ~2 million records in this table so modifying would take a long time. And I need to find records by performing phone lookup.
One thing I control is the input, I can guarantee that it's going to be numeric. But what I don't know is whether they're going to type in numeric number with or without the country code so taking this number as an example 1-416-958-6390
both of these used in lookup:
14169586390
4169586390
Should return the same row (1-416-958-6390
)
How do I do phone for phone numbers as efficiently as I can?
This is what I got but it's super slow (as expected) :
select * from patients
WHERE ( REPLACE(phone_number, '-', '') LIKE '%4169586390' )
Best Answer
To clean up the numbers use a regex that removes everything that's not a numeric digit:
You can create an index on that, but a regular B-Tree index won't be used LIKE conditions where the wildcard is on the left.
In Postgres you have two choices: you can create a regular B-Tree index on the reverse value and use that instead:
Then use that expression and compare it with the reversed input:
The other choice is to install the pg_trgm extension and create a trigram index on the expression. That way you can use the input "as is":
That index is then usable for:
However, GiST indexes are bigger and a bit slower to maintain than B-Tree indexes.
If you don't want to repeat that regex expression in every query you can create a view that includes that, e.g.:
Then use (assuming you are using the GiST index, so no reverse required):
Postgres is smart enough to still use the index in that case.
If you are on Postgres 12, an alternative to the view is to use a computed column with the regex expression, then create the index on the computed column.