Sql-server – Number of logical reads in SQL server JOIN

join;sql server

Please refer to the SELECT query shown below, which includes a JOIN between two tables that I have renamed as Table1 and Table2, where the latter is a temporary one:

Table1 has over 15M records, whereas Table2 is around 9K.
Product team says the JOIN acts as a filter as far as reads are concerned and that they should be 9K at most. However, DBA insists that the JOIN here is not a filter and that the reads are more than 14M because there is no WHERE clause here.

My humble opinion (not being a SQL developer or DBA myself) is that the JOIN can be thought of as a filter in the final resultset, but SQL Server somehow has to read the entire Table1 before it can actually perform the JOIN with Table2.

The reason that prompted this discussion was a performance issue and the fact that the execution plan shows +14M of actual rows associated with this query:

Unfortunately, this is proprietary code I am not allowed to post publicly. My question is more geared towards what takes place during a SQL JOIN under the hood. Does SQL Server need to read the entire tables before returning the result set?

The exact question would be: Does it make any sense that SQL Server had to make +14M reads to perform the JOIN in this scenario?

Best Answer

There are cases where every single row from your larger table could be read, and cases where it is closer to the number from your smaller table. SQL Server will pick the method it thinks is most efficient based upon the information it has.

For example, if you have an index on the larger table that includes the join column, there is likely going to be less reading, as it should be able to skip a bunch of records. If you don't, then the only way it will know what rows to select is by reading each row. Keep in mind that statistics will impact the choices made, and if they are out of date it could lead to a bad choice.

Note there are implications for adding indexes, and there are numerous cases where they hurt over all performance, versus increasing it, so don't go willy-nilly in creating them. I'd say ask your DBA, but...

Personally, I would question if you want to retain your DBA. An INNER JOIN (which is what you have, with the INNER omitted) is a filter and in many (but not all) cases you can move predicates between the WHERE clause and the ON clause with zero impact (SQL Server will optimize to the exact same query plan), so the "because there is no WHERE clause" is not only wrong, but in fundamentally misunderstand sort of way.

This is not to say your product team is correct. Looking at the screen shot, there are clearly more reads than 9K. If your products team is in Marketing, then I can understand their confusion (<grin>).

Lastly, the NOLOCK in there should be a big red flag. There are cases where it won't cause a problem, but one should be able to clearly be able to articulate what the problems are, and why it won't be an issue before using it.

Related Solutions

Sql-server – SQL Server Join/where processing order

The logical processing of a query is on MSDN (written by Microsoft SQL Server team, not 3rd parties)

1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP

A derived table follows this, then the outer query does it again etc etc

This is logical though: not actual. No matter how SQL Server actually does it, these semantics are honoured to the letter. The "actual" is determined by the Query Optimiser (QO) and you avoid the intermediate Cartesion product you mentioned.

It's worth mentioning that SQL is declarative: you say "what" not "how" like you would for a procedural/imperative programming (Java, .net). So saying "this happens before that" is wrong in many cases (eg assumption of short circuits or L-to-R WHERE order)

In your case above, the QO will generate the same plan no matter how it is structured because it is a simple query.

However, the QO is cost based and for a complex query it may take 2 weeks to generate the ideal plan. So it does "good enough" which actually isn't.

So your first case may help the optimiser find a better plan because the logical processing order is different for the 2 queries. But it may not.

I have used this trick on SQL Server 2000 to get 60x speed performance improvement on reporting queries. As the QO improves version to version it gets better at working these things out.

And the book you mentioned: there is some dispute over it
See SO and the subsequent links: https://stackoverflow.com/q/3270338/27535

Sql-server – Is it correct, order of where clause doesn’t matter when it is used with join

The order of items in the where clause should not make a difference, especially if you use the preferred join syntax as follows:

select a.col1, 
       b.col2 
 from table1 a 
 join table2 b on b.col1 = a.col1
where a.col3 = 10

This keeps the join conditions separated from the filters.

Best Answer

Related Solutions

Sql-server – SQL Server Join/where processing order

Sql-server – Is it correct, order of where clause doesn’t matter when it is used with join

Related Question