SQL Server 2014 – Find Records with Same String and Extra Character

full-text-searchsql serversql server 2014string-searching

OK, so I have a Microsoft SQL Server 2014 database table called owner with around 90,000 records with owner information, another called vehicle with vehicle information

Owner_Name                   owner_id       V_name     owner_id    exempt
-------------------------------------       ------------------------------
JACOB JAMISON & JESSICA           35        Civic            35        H3
JACOB JAMISON M & JESSICA B       39        Accord           39        H3 
BLACKSON BARRINGTON               56        Bugatti          56        H6
BLACKSON BARRINGTON H             98        SSC              98        H7
BRUSTER MICHAEL                   107       Corvette         107       H9

I'm trying to find all records that have more than one exemption on a vehicle
( H0 means no exemption). This code below has worked well, as long as the name is exactly the same. However, if there's a variation, such as an extra letter or if it's entered backwards, then it won't return those records. I've looked at things like SOUNDEX, but this wouldn't work in my scenario.

SELECT Owner_name
     , COUNT(Owner_name) AS 'xNameAppears'
     , COUNT(v.exempt) AS 'ExemptionCount' 
FROM owner o
INNER JOIN vehicle V ON V.owner_id = o.owner_id
WHERE v.exempt <> 'H0'
GROUP BY O.owner_name
HAVING COUNT(v.exempt) > 1

Is there a solution that would allow me to return records like so, not knowing which owner_name's may be similar? Basically trying to get the server to search through the owner_name column and if there's a similarity such as
JACOB JAMISON & JESSICA and JACOB JAMISON M & JESSICA B then it'll return those records like so:

Owner_Name                      xNameAppears      ExemptCount
-------------------------------------------------------------      
JACOB JAMISON & JESSICA           2                         2
JACOB JAMISON M & JESSICA B       2                         2
BLACKSON BARRINGTON               2                         2
BLACKSON BARRINGTON H             2                         2

Thank you in advance!

Best Answer

The SOUNDEX function can be applied to a column as well.

But since

there's thousands like that

I wouldn't suggest just writing a query to join on a function to do that.

This will likely not perform very well on larger tables:

SELECT *
FROM dbo.vehicle AS v
JOIN dbo.vehicle AS v2
ON SOUNDEX(v2.Owner_Name) = SOUNDEX(v.Owner_Name)
AND v2.Owner_Name <> v.Owner_Name;

I'd rather do something that will make finding this easier in the long-term.

Here's an example:

CREATE TABLE dbo.vehicle (Owner_Name VARCHAR(50));
INSERT dbo.vehicle ( Owner_Name )
SELECT *
FROM (  
VALUES            
('JACOB JAMISON & JESSICA'),
('JACOB JAMISON M & JESSICA B'),
('BLACKSON BARRINGTON'),          
('BLACKSON BARRINGTON H'),        
('BRUSTER MICHAEL')
) AS x (Owner_Name);

I'm going to add a computed column based on the function, and then add an index to aid my query.

ALTER TABLE dbo.vehicle ADD Owner_Soundex AS SOUNDEX(Owner_Name);

CREATE INDEX ix_whatever ON dbo.vehicle (Owner_Soundex, Owner_Name);

Validate that everything looks good...

SELECT *
FROM dbo.vehicle AS v

Use a query like this to find imprecise matches:

SELECT *
FROM dbo.vehicle AS v
JOIN dbo.vehicle AS v2
ON v2.Owner_Soundex = v.Owner_Soundex
AND v2.Owner_Name <> v.Owner_Name;