SQL Server – Alternatives to ‘OR’ in WHERE Clause

performancequery-performancesql serversql-server-2012

I have a query like this:

SELECT  ordIdent.Identifier, ord.OrderId
FROM    OrderIdentifier ordIdent
        JOIN [order] ord
            ON ordIdent.OrderId = ord.OrderId
WHERE   ordIdent.Identifier = '29584'
        OR ord.ClientOrderId = '29584'

The idea is that we are searching using the value '29584' and we don't really know if it is a ClientOrderId or an Identifier, so we want to search both.

This gives really really bad performance. It pulls millions of rows (125 thousand from one and 20 Million from the other)

But this query:

SELECT  ordIdent.Identifier, ord.OrderId
FROM    OrderIdentifier ordIdent
        JOIN [order] ord
            ON ordIdent.OrderId = ord.OrderId
WHERE   ord.ClientOrderId = '01193504' 

UNION

SELECT  ordIdent.Identifier, ord.OrderId
FROM    OrderIdentifier ordIdent
        JOIN [order] ord
            ON ordIdent.OrderId = ord.OrderId
WHERE   ordIdent.Identifier = '01193504'

has good performance. Normally I would just do this and call it a day.

But my actual query is very large, and I would rather not call it twice (though it is still better than using the "or".) So before I just set it up to be called twice, I thought I would ask around and see if I have other options.

It seems I should be to do some kind of left join to make this work. But I can't seem to wrap my head around it.

Is there a way to do this in one query that has a left join instead of an "OR"? (Or some other kind of option?)

UPDATE:
OK, I got thinking about it and I realized I could just factor out the OR into a CTE that uses the union and then use the CTE the actual query.

I will leave this open in case there is a better way to deal with it, but a CTE fixes my issue.

Best Answer

I think that you can rewrite that query to use a FULL OUTER JOIN:

SELECT  ordIdent.Identifier, ord.OrderId
FROM    OrderIdentifier ordIdent
        FULL OUTER JOIN [order] ord
            ON ordIdent.OrderId = ord.OrderId
            AND ( ordIdent.Identifier = '29584'
                    OR ord.ClientOrderId = '29584')
WHERE ordIdent.Identifier IS NOT NULL AND ord.OrderId IS NOT NULL;

But why would you want to? You'll end up reading all rows from both tables with that syntax which causes the performance problem that you're trying to avoid. UNION or UNION ALL is the right way to go here. For filters like you have you can still use UNION ALL like so:

SELECT  ordIdent.Identifier, ord.OrderId
FROM    OrderIdentifier ordIdent
        JOIN [order] ord
            ON ordIdent.OrderId = ord.OrderId
WHERE   ord.ClientOrderId = '01193504' 

UNION

SELECT  ordIdent.Identifier, ord.OrderId
FROM    OrderIdentifier ordIdent
        JOIN [order] ord
            ON ordIdent.OrderId = ord.OrderId
WHERE   ordIdent.Identifier = '01193504' AND (ord.ClientOrderId 
 IS NULL OR ord.ClientOrderId <> '01193504')

Sometimes there's a slight performance benefit in doing so. It depends on your data.