I see 3 ways to do this but all involve a UNION ALL
:
your version`:
SELECT
T1.C1, ....., T1.CN,
T2.C1, ..., T2.CM
FROM
Table1 T1 JOIN Table2 T2
ON T1.Key1 = T2.Key1
UNION ALL
SELECT
T1.C1, ..., T1.CN,
NULL, ... NULL
FROM
Table1 T1 ;
slightly changing the second part:
SELECT
T1.C1, ....., T1.CN,
T2.C1, ..., T2.CM
FROM
Table1 T1 JOIN Table2 T2
ON T1.Key1 = T2.Key1
UNION ALL
SELECT
T1.C1, ....., T1.CN,
T2.C1, ..., T2.CM
FROM
Table1 T1 LEFT JOIN Table2 T2
ON 0 = 1 ; -- FALSE
first a UNION
, then join:
SELECT
T1.C1, ....., T1.CN,
T2.C1, ..., T2.CM
FROM
Table1 T1 JOIN
( SELECT * FROM T2
UNION ALL
SELECT NULL, ..., NULL
) AS T2
ON T1.Key1 = T2.Key1
OR T2.Key1 IS NULL ;
I don't think there will be much difference in execution plans and efficiency but the first one seems more simple.
the following links will provide a good source of knowledge regarding execution plans.
From Execution Plan Basics — Hash Match Confusion I found:
From
http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/
"The hash join is one of the more expensive join operations, as it
requires the creation of a hash table to do the join. That said, it’s
the join that’s best for large, unsorted inputs. It is the most
memory-intensive of any of the joins
The hash join first reads one of the inputs and hashes the join column
and puts the resulting hash and the column values into a hash table
built up in memory. Then it reads all the rows in the second input,
hashes those and checks the rows in the resulting hash bucket for the
joining rows."
which links to this post:
http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx
Can you explain this execution plan? provides good insights about the execution plan with, not specific to hash match but relevant.
The constant scans are a way for SQL Server to create a bucket into
which it's going to place something later in the execution plan. I've
posted a more thorough explanation of it here. To understand what the
constant scan is for, you have to look further into the plan. In this
case, it's the Compute Scalar operators that are being used to
populate the space created by the constant scan.
The Compute Scalar operators are being loaded up with NULL and the
value 1045876, so they're clearly going to be used with the Loop Join
in an effort to filter the data.
The really cool part is that this plan is Trivial. It means that it
went through a minimal optimization process. All the operations are
leading up to the Merge Interval. This is used to create a minimal set
of comparison operators for an index seek (details on that here).
In this question:
Can I get SSMS to show me the Actual query costs in the Execution plan pane?
I'm fixing performance issues on a multistatement stored procedure in SQL Server. I want to know which part(s) I should spend time on.
I understand from How do I read Query Cost, and is it always a percentage? that even when SSMS is told to Include Actual Execution Plan, the "Query cost (relative to the batch)" figures is still based on cost estimates, which can be far off actuals
Measuring Query Performance : “Execution Plan Query Cost” vs “Time Taken”
gives good info for when you need to compare the performance of 2 different queries.
In Reading a SQL Server Execution plan you can find great tips for reading the execution plan.
Other questions/answers that I really liked because they are relevant to this subject, and for my personal reference I would like to quote are:
How to optimise T-SQL query using Execution Plan
can sql generate a good plan for this procedure?
Execution Plans Differ for the Same SQL Statement
Best Answer
Putting certain types of
OR
logic in a join clause will prohibit SQL Server from implementing the join as a merge join or hash join. The query that you provided appears to be such a query.Both of those queries error out with the following:
Only a nested loop join is available to the query optimizer for your query. For some table structures and data structures a nested loop join can be very inefficient. In general, restricting the physical join operators available to the query optimizer is not a good idea. You want other physical join operators to be available so the query optimizer can find a plan with the lowest estimated cost.
There are two common rewrites that I know of to avoid
OR
logic in a join clause. The first is that sometimes you can just split theLEFT OUTER JOIN
into two joins if the cardinality between the tables is 1:1. That doesn't appear to be the case here so that option isn't available to you.The second option is to use
UNION
orUNION ALL
. You should preferUNION ALL
whenever possible because it avoids a sorting or hashing step.For this query in particular, first verify that it does what you need it to. Do you truly need to return all columns with
SELECT *
? Do you want to return teams that haven't played any games? Once your query syntax is correct consider your response time requirements. How fast does your query need to be? Once you know that try testing out the different rewrites available to you. If none of them are fast enough consider adding indexes on the tables.If you need further help than this you will need to provide the information that the commentators asked for. Even if this answer is sufficient to answer your question, it's good practice to provide that information upfront in future questions.