Sql-server – Join query using OR on two different columns

sql serversql server 2014

Is there a more efficient way (performance wise) to write this?

select * 
from stat.UniqueTeams uTeam
Left Join stat.Matches match on match.AwayTeam = uTeam.Id 
                             or match.HomeTeam = uTeam.id

Best Answer

Putting certain types of OR logic in a join clause will prohibit SQL Server from implementing the join as a merge join or hash join. The query that you provided appears to be such a query.

CREATE TABLE X_UniqueTeams (ID INT);
CREATE TABLE X_Matches (AWAYTEAM INT, HOMETEAM INT);

select * 
from X_UniqueTeams uTeam
Left HASH Join X_Matches match 
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id;

select * 
from X_UniqueTeams uTeam
Left MERGE Join X_Matches match 
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id;

Both of those queries error out with the following:

Msg 8622, Level 16, State 1, Line 37
Query processor could not produce a query plan because of the hints defined in this query.
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Only a nested loop join is available to the query optimizer for your query. For some table structures and data structures a nested loop join can be very inefficient. In general, restricting the physical join operators available to the query optimizer is not a good idea. You want other physical join operators to be available so the query optimizer can find a plan with the lowest estimated cost.

There are two common rewrites that I know of to avoid OR logic in a join clause. The first is that sometimes you can just split the LEFT OUTER JOIN into two joins if the cardinality between the tables is 1:1. That doesn't appear to be the case here so that option isn't available to you.

The second option is to use UNION or UNION ALL. You should prefer UNION ALL whenever possible because it avoids a sorting or hashing step.

select * 
from stat.UniqueTeams uTeam
Left Join stat.Matches match 
on match.AwayTeam = uTeam.Id 
UNION --ALL
select * 
from stat.UniqueTeams uTeam
Left Join stat.Matches match 
on match.HomeTeam = uTeam.id

For this query in particular, first verify that it does what you need it to. Do you truly need to return all columns with SELECT *? Do you want to return teams that haven't played any games? Once your query syntax is correct consider your response time requirements. How fast does your query need to be? Once you know that try testing out the different rewrites available to you. If none of them are fast enough consider adding indexes on the tables.

If you need further help than this you will need to provide the information that the commentators asked for. Even if this answer is sufficient to answer your question, it's good practice to provide that information upfront in future questions.

Related Solutions

How to Perform Left Join with Extra Null Row for Every Match in SQL Server

I see 3 ways to do this but all involve a UNION ALL:

your version`:

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 JOIN Table2 T2
        ON T1.Key1 = T2.Key1

UNION ALL

SELECT
    T1.C1, ..., T1.CN,
    NULL, ... NULL
FROM
    Table1 T1 ;

slightly changing the second part:

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 JOIN Table2 T2
        ON T1.Key1 = T2.Key1

UNION ALL

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 LEFT JOIN Table2 T2
        ON 0 = 1 ;                   -- FALSE

first a UNION, then join:

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 JOIN  
      ( SELECT * FROM T2
        UNION ALL
        SELECT NULL, ..., NULL
       ) AS T2
        ON T1.Key1 = T2.Key1 
        OR T2.Key1 IS NULL ;

I don't think there will be much difference in execution plans and efficiency but the first one seems more simple.

Improving Query Performance by Removing Hash Match Inner Join in SQL Server 2014

the following links will provide a good source of knowledge regarding execution plans.

From Execution Plan Basics — Hash Match Confusion I found:

From http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/

"The hash join is one of the more expensive join operations, as it requires the creation of a hash table to do the join. That said, it’s the join that’s best for large, unsorted inputs. It is the most memory-intensive of any of the joins

The hash join first reads one of the inputs and hashes the join column and puts the resulting hash and the column values into a hash table built up in memory. Then it reads all the rows in the second input, hashes those and checks the rows in the resulting hash bucket for the joining rows."

which links to this post:

http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx

Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.

The constant scans are a way for SQL Server to create a bucket into which it's going to place something later in the execution plan. I've posted a more thorough explanation of it here. To understand what the constant scan is for, you have to look further into the plan. In this case, it's the Compute Scalar operators that are being used to populate the space created by the constant scan.

The Compute Scalar operators are being loaded up with NULL and the value 1045876, so they're clearly going to be used with the Loop Join in an effort to filter the data.

The really cool part is that this plan is Trivial. It means that it went through a minimal optimization process. All the operations are leading up to the Merge Interval. This is used to create a minimal set of comparison operators for an index seek (details on that here).

In this question: Can I get SSMS to show me the Actual query costs in the Execution plan pane? I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.

I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals

Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken” gives good info for when you need to compare the performance of 2 different queries.

In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.

Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:

How to optimise T-SQL query using Execution Plan

can sql generate a good plan for this procedure?

Execution Plans Differ for the Same SQL Statement

Best Answer

Related Solutions

How to Perform Left Join with Extra Null Row for Every Match in SQL Server

Improving Query Performance by Removing Hash Match Inner Join in SQL Server 2014

Related Question