Given a GUID for an event, say logins, how might I search for similar or dissimilar events, utilizing the other columns data?
For example:
Find me similar logins to [GUID] {Server:"s123", State="NY", Status="503", etc…}
OR
Find me logins the opposite of that example given?
I feel there must been a name for this, but Full-Text, or "%Like%" syntax I do not believe would be sufficient.
Is indexing the data with solr or elasticsearch the solution?
Best Answer
The simplistic method is to count the number of columns that match:
where "name" and "phone" are columns in your tables. You can use the same table as both the needle and haystack if you want (to find duplicates in a table).
http://blog.databasepatterns.com/2014/12/fuzzy-record-matching-in-sql-p1.html
The more correct and much more complicated method is to use Probabilistic Record Linkage. There's an R package for it (recordlinkage)