I'm creating a function to find out what search pattern would best fit a single string. The string to match will be compared against a table of what could be possibly hundreds of possible patterns to see if it matches any of them.
I've got a not-so-elegant function working right now that will read all patterns into a memory table and iterate though them to see if each matches. It works but makes me sick to my stomach to think of using this in a production environment, especially if the pattern table grows to thousands of patterns.
Example string to match: 'Master'
Possible patterns:
- m%r
- abc%
- xyz%
- a%bcd
The outcome would be a match to pattern #1.
In the scenario, I need to be able to match potential part numbers, brands, and other possibly misspelled phrases. I'm incorporating Double Metaphone but when it comes to other items that only a pattern will catch, I need to know what pattern was the match among many. The "StringToSearch" would only return 1 matched pattern.
Does anyone have theory or other code types that can help me with this reverse pattern matching process?
Best Answer
Without knowing anything about what your actual scenario looks like, I'd perhaps do something like:
You'll get a better set of answers if you more accurately describe exactly what your scenario is by explaining how the data is designed, and what you exact design goals are.
To see if this methodology is effective enough, I've extended the MCVE above by adding a couple of indexes and a bunch of data:
The tables now have clustered indexes:
This inserts 50,000 words into the
Strings
table:A sample of the rows from that table:
Next, we'll create 1,000,000 rows in the
Matches
table with wildcards:Here's a sample of those rows:
As you can see from the sample above, some of the rows have a T-SQL single-character wildcard,
_
, and some rows don't have wildcards.Now, we'll run some code to see if we can find a row from
Matches
that matches up to a row inStrings
:The
WHERE
clause randomly chooses a word from theStrings
table. You may need to run this several times before you get a word with matching values in theMatches
table.Looking up any given word in the
Strings
table, with associatedMatches
:The plan for this query:
On my system, this query executes in ~100 milliseconds.
The results of that query:
If you were going to run hundreds or thousands of these queries per minute, you might want to re-think the strategy, but that's putting the horse before the cart.
If you want to implement this as a function, I'd consider this:
That code creates a schema-bound table-valued-function that gets "inlined" with the target query, and is about as efficient as you can get for this type of thing.
This would be an example of how you'd call the function: