Longest prefix search in Oracle

full-text-searchindexoracle

I have a list of phone number prefixes defined for large number of zones (in query defined by gvcode and cgi).
I need to efficiently find a longest prefix that matches given number PHONE_NR.

I use inverted LIKE clause on field digits (which contains prefixes in form +48%, +49%, +1%, +1232% and so on).

Therefore I can't use normal index on that field.

I managed to get substantial improvement by using IOT on gvcode and cgi field (which are part (first two cols) of primary key).
I also looked at some oracle text indexes but can't find one that will match longer input with shorter prefix in the table.

Is there any other way to perform such search that is faster than this approach.

Here is the query which gives a list of all matched prefixes (I sort it afterwards on digits length).

  select  t.gvcode,  t.digits
                from NUMBERS t 
                    where 
                        t.gvcode=ZONE_SET_CODE 
                        and t.cgi=cgi_f
                       and ( PHONE_NR like t.digits)
                         order by length(digits) desc 

Best Answer

This turns up a lot in the Telco space (where I work). There are several approaches you can use to get reasonable performance while still remaining entirely in the database.

In all cases, you need to drop the "%" from your DIGITS column, that extra character isn't helping you at all.

Option 1. Iterative function.

DECLARE
    p VARCHAR(30);                
BEGIN
    p := :full_number;

    WHILE (LENGTH (p) > 0)
    LOOP
        BEGIN
            SELECT [whatever]
            INTO :whatever
            FROM numbers t
            WHERE digits = p;

            RETURN;
        EXCEPTION
            WHEN NO_DATA_FOUND THEN
                p := SUBSTR (p, 0, LENGTH (p) - 1);
        END;                    
    END LOOP;
END;

This kind of function performs reasonably well, assuming you have a nice index on DIGITS.

Option 1. START/END range keys. Create two secondary columns (BOTH INDEXED).

DIGITS_RANGE_START
DIGITS_RANGE_END

These are derived values that you set by an insert/update trigger.

DIGITS_RANGE_START should be automatically set to DIGITS || CHAR(0)
DIGITS_RANGE_END should be automatically set to DIGITS || CHAR(255)

Now your query can be

SELECT ... FROM numbers t 
WHERE (number_to_match || CHR(1) > digits_range_start) 
  AND (number_to_match || CHR(1) < digits_range_end) AND ... [other conditions]
ORDER BY LENGTH (digits) DESC;

This will hit the indexes with a nice comparison match, and Oracle should be able to do a decent job. It may return multiple matches, you need to take the first one.

More Options

See also this famous thread for fancy options with Index Organized Tables if you want to get real clever and real fast.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4246230700346756268