Sql-server – Optimizing Negative Comparison in SQL

performanceperformance-tuningsql serversql-server-2016

In a SQL Server 2016 environment, I have a SELECT query which queries one table, which joins to itself on 2 indexed columns (both non-clustered):

SELECT
    A.REFERENCE,
    A.MEMBERID
FROM 
    TRANS A INNER JOIN 
    TRANS B ON A.REFERENCE = B.REFERENCE AND A.MEMBERID = B.MEMBERID

This one above returns in about 2 seconds.

But, when using the same query, but looking for matching references across different memberids e.g. changing = to <>, it takes about 20 seconds.

SELECT
    A.REFERENCE,
    A.MEMBERID,
    B.MEMBERID AS MEMBERID_B
FROM 
    TRANS A INNER JOIN 
    TRANS B ON A.REFERENCE = B.REFERENCE AND A.MEMBERID <> B.MEMBERID

I realize that this 2nd query uses a negative comparison and thus, must fully review table B vs. the 1st query which can just find the single match and move on (I think). In the query plan, it uses the same index and both are seeks, but the = estimates a single row involved, and the <> estimates 20 million rows (entire table).

My question is: without being able to alter any indexes or table structure, etc. how might I go about optimizing the query with the negative comparison? (or re-writing it to achieve the same results?) I've searched google, and found lots of info saying not to use negative comparisons where possible, but not much on optimizing when you must.

Unfortunately I can't post the exact code or anything from the db environment because I am at work and we're not permitted to post any actual code etc. e.g. Clipboard won't even work across the remote connection and can't use a browser in the machine I am accessing the DB from. My example is a dummy example to mimic the similar situation. I realize the best way is to examine indexes, structure, and query plans, but I was hoping to learn some general advice for the situation.

Best Answer

It's hard to say much given your need to keep your code and data confidential, but sometimes you can get better performance just by trying equivalent rewrites. The following query should return the same results but it's very likely to have a different query plan:

SELECT 
A.REFERENCE, 
A.MEMBERID, 
B.MEMBERID AS MEMBERID_B 
FROM 
TRANS A INNER JOIN 
TRANS B ON A.REFERENCE = B.REFERENCE AND A.MEMBERID > B.MEMBERID 

UNION ALL 

SELECT 
A.REFERENCE, 
A.MEMBERID, 
B.MEMBERID AS MEMBERID_B 
FROM 
TRANS A INNER JOIN 
TRANS B ON A.REFERENCE = B.REFERENCE AND A.MEMBERID < B.MEMBERID

As you said in chat, this rewrite finishes in 6 seconds which is a decent improvement over the original 20 seconds you were seeing with <>.