Sql-server – Number of logical reads in SQL server JOIN

join;sql server

Please refer to the SELECT query shown below, which includes a JOIN between two tables that I have renamed as Table1 and Table2, where the latter is a temporary one:

enter image description here

  • Table1 has over 15M records, whereas Table2 is around 9K.
  • Product team says the JOIN acts as a filter as far as reads are concerned and that they should be 9K at most. However, DBA insists that the JOIN here is not a filter and that the reads are more than 14M because there is no WHERE clause here.

My humble opinion (not being a SQL developer or DBA myself) is that the JOIN can be thought of as a filter in the final resultset, but SQL Server somehow has to read the entire Table1 before it can actually perform the JOIN with Table2.

The reason that prompted this discussion was a performance issue and the fact that the execution plan shows +14M of actual rows associated with this query:

enter image description here

Unfortunately, this is proprietary code I am not allowed to post publicly. My question is more geared towards what takes place during a SQL JOIN under the hood. Does SQL Server need to read the entire tables before returning the result set?

The exact question would be: Does it make any sense that SQL Server had to make +14M reads to perform the JOIN in this scenario?

Best Answer

There are cases where every single row from your larger table could be read, and cases where it is closer to the number from your smaller table. SQL Server will pick the method it thinks is most efficient based upon the information it has.

For example, if you have an index on the larger table that includes the join column, there is likely going to be less reading, as it should be able to skip a bunch of records. If you don't, then the only way it will know what rows to select is by reading each row. Keep in mind that statistics will impact the choices made, and if they are out of date it could lead to a bad choice.

Note there are implications for adding indexes, and there are numerous cases where they hurt over all performance, versus increasing it, so don't go willy-nilly in creating them. I'd say ask your DBA, but...

Personally, I would question if you want to retain your DBA. An INNER JOIN (which is what you have, with the INNER omitted) is a filter and in many (but not all) cases you can move predicates between the WHERE clause and the ON clause with zero impact (SQL Server will optimize to the exact same query plan), so the "because there is no WHERE clause" is not only wrong, but in fundamentally misunderstand sort of way.

This is not to say your product team is correct. Looking at the screen shot, there are clearly more reads than 9K. If your products team is in Marketing, then I can understand their confusion (<grin>).

Lastly, the NOLOCK in there should be a big red flag. There are cases where it won't cause a problem, but one should be able to clearly be able to articulate what the problems are, and why it won't be an issue before using it.