Sql-server – Specifying columns in LEFT JOIN through subquery beneficial performance wise

optimizationperformancesql server

I'm trying to understand how JOINS are handled, and was wondering about the following:

Imagine a big query concerning a lot of tables, I'm really looking for small optimizations as it's a pretty important query. What would be faster in the long run. The 'normal':

SELECT 
    TOP x
    A.a
    , A.xxx
    , B.a
    , B.xxx
FROM A 
LEFT JOIN B ON (A.xxx = B.xxx)
ORDER BY A.a

Or specifying the columns of the joined table beforehand?

SELECT 
    TOP x
    A.a
    , A.xxx
    , B.a
    , B.xxx
FROM A 
LEFT JOIN 
(
    SELECT 
        B.A AS A
        , B.xxx AS xxx
    FROM B
 ) AS B ON (A.xxx = B.xxx)
ORDER BY A.a

I guess this boils down to having a good understanding of how SQL Server handles optimization. In my preliminary tests I haven't found any difference, but these tests aren't quite as complicated as 'The Real Thing'. Will the latter just be optimized to the former by the interpreter?

Since I have never seen the latter side being used (but admittedly, I'm really new to this), my gut feeling tells me the first one will be preferred, but a gut feeling won't (or shouldn't) hold up as real evidence.

Bare in mind the real query really has alot of tables involved with probably only 1 or 2 relevant columns per table, so I was thnking perhaps I can doe some sort of preselection to avid including the whole table in the query.

So, any ideas?

Best Answer

For most simple cases, like your, there will not be any difference. SQL Server is lazy and will figure out the most easiest way of completing your query and it will do a very good job at it (most of the time).

The best way of analyzing the difference yourself is to look at the query plan of both queries.

Given this example of two queries similar to yours:

SELECT 
    c.CustomerId,
    o.OrderId,
    o.ArticleId,
    o.Customerid
FROM 
    dbo.Customer c
INNER JOIN 
    dbo.orders o
    ON o.CustomerId = c.CustomerId


SELECT 
    c.CustomerId,
    o.OrderId,
    o.ArticleId,
    o.Customerid
FROM 
    dbo.Customer c
INNER JOIN 
(
    SELECT 
        o.OrderId,
        o.ArticleId,
        o.Customerid
    from    dbo.orders o
) AS o
    ON o.CustomerId = c.CustomerId

The query plans for both will look exactly the same. Query Plan of both queries

If you hover your mouse over the Index Scan operator you will see the Output List at the bottom of the popup. The Output List will show exactly which columns that will be "extracted" from the table. Both queries produce the same Output Lists which means that there is no benefit of "pre-selecting" the columns because SQL Server is already doing this for you.

This is true in your example since you are specifically specifying columns from both tables in your select clause, if you would do SELECT * then there could be a difference.