How to match similar records using multiple columns

full-text-searchpattern matching

Given a GUID for an event, say logins, how might I search for similar or dissimilar events, utilizing the other columns data?

For example:
Find me similar logins to [GUID] {Server:"s123", State="NY", Status="503", etc…}

OR

Find me logins the opposite of that example given?

I feel there must been a name for this, but Full-Text, or "%Like%" syntax I do not believe would be sufficient.

Is indexing the data with solr or elasticsearch the solution?

Best Answer

The simplistic method is to count the number of columns that match:

select
 n.id as needle_id,
 h.id as haystack_id,
 case when n.name = h.name then 1 else 0 end 
 + case when n.phone = h.phone then 1 else 0 end as relevance
from 
 needles n
join 
 haystack h 
on 
 n.name = h.name 
 or n.phone = h.phone
order by 
 relevance desc;

where "name" and "phone" are columns in your tables. You can use the same table as both the needle and haystack if you want (to find duplicates in a table).

http://blog.databasepatterns.com/2014/12/fuzzy-record-matching-in-sql-p1.html

The more correct and much more complicated method is to use Probabilistic Record Linkage. There's an R package for it (recordlinkage)