Sql-server – Join query using OR on two different columns

sql serversql server 2014

Is there a more efficient way (performance wise) to write this?

select * 
from stat.UniqueTeams uTeam
Left Join stat.Matches match on match.AwayTeam = uTeam.Id 
                             or match.HomeTeam = uTeam.id

Best Answer

Putting certain types of OR logic in a join clause will prohibit SQL Server from implementing the join as a merge join or hash join. The query that you provided appears to be such a query.

CREATE TABLE X_UniqueTeams (ID INT);
CREATE TABLE X_Matches (AWAYTEAM INT, HOMETEAM INT);

select * 
from X_UniqueTeams uTeam
Left HASH Join X_Matches match 
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id;

select * 
from X_UniqueTeams uTeam
Left MERGE Join X_Matches match 
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id;

Both of those queries error out with the following:

Msg 8622, Level 16, State 1, Line 37
Query processor could not produce a query plan because of the hints defined in this query.
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Only a nested loop join is available to the query optimizer for your query. For some table structures and data structures a nested loop join can be very inefficient. In general, restricting the physical join operators available to the query optimizer is not a good idea. You want other physical join operators to be available so the query optimizer can find a plan with the lowest estimated cost.

There are two common rewrites that I know of to avoid OR logic in a join clause. The first is that sometimes you can just split the LEFT OUTER JOIN into two joins if the cardinality between the tables is 1:1. That doesn't appear to be the case here so that option isn't available to you.

The second option is to use UNION or UNION ALL. You should prefer UNION ALL whenever possible because it avoids a sorting or hashing step.

select * 
from stat.UniqueTeams uTeam
Left Join stat.Matches match 
on match.AwayTeam = uTeam.Id 
UNION --ALL
select * 
from stat.UniqueTeams uTeam
Left Join stat.Matches match 
on match.HomeTeam = uTeam.id

For this query in particular, first verify that it does what you need it to. Do you truly need to return all columns with SELECT *? Do you want to return teams that haven't played any games? Once your query syntax is correct consider your response time requirements. How fast does your query need to be? Once you know that try testing out the different rewrites available to you. If none of them are fast enough consider adding indexes on the tables.

If you need further help than this you will need to provide the information that the commentators asked for. Even if this answer is sufficient to answer your question, it's good practice to provide that information upfront in future questions.