SQL Execution Plan – Merge Join Turns to Hash Match with Second Join

execution-planjoin;

I would like to educate myself on what goes on under the hood in SQL Server, so I'm about to delve into Grant Fritchey's book Dissecting SQL Server Execution Plans. As it's 181 pages I just wanted to ask this simple question, which is what first got me interested in this in the first place – hopefully it will pique my interest (plus I'm too impatient to wait until I've waded through this tome to get some kind of answer!)

I am using this miniature version of Northwind to run this simple query:

SELECT Orders.OrderID
  ,Orders.OrderDate
  ,[Order Details].UnitPrice
FROM Orders
JOIN [Order Details]
  ON Orders.OrderID = [Order Details].OrderID

Which gives the following execution plan:

enter image description here

Which seems sensible, given both tables have indexing / sorting on the joining fields. But if I add another table (products) like so:

SELECT Orders.OrderID
  ,Orders.OrderDate
  ,[Order Details].UnitPrice
  ,Products.ProductName
FROM Orders
JOIN [Order Details]
  ON Orders.OrderID = [Order Details].OrderID
JOIN Products
  ON Products.ProductID = [Order Details].ProductID

Suddenly I have Hash matches. This seemed odd to me (remember I'm just beginning to learn this stuff!), as I thought hash matches were for large, unsorted joins. Why would SQL think one join type is okay in the first query and not the second, even though surely they are joining the same number of rows / same indexes etc between Orders and Order Details in both queries?

Orders has 830 rows, Order Details 2155 and Products 77.

OrderID has a clustered indexes on Orders and Order Details, and ProductID has a non-clustered index on Order Details and clustered on Products.

enter image description here

Thanks

Best Answer

I think this stack exchange link gives a good rundown of why it is doing what it is doing, and I believe it has most to do with the size of the results sets and how they are indexed/sorted in your case. Order and Order Details, from this database are 830 rows and 2155 rows respectively. These are similar results sets where one has 2 rows (roughly) for every one row in the other table. Order Details is clustered on OrderID, ProductID - meaning that the table is also sorted that way. Orders is obviously clustered on OrderID as well, meaning that it is also sorted on orderid. I believe this makes it easy to merge these two data sets that are already sorted in this way together.

Then you throw in Products. If you run the query

SELECT * FROM dbo.[Order Details] AS od
ORDER BY OrderID, od.ProductID

This is basically showing you what the logical order of the Order Details table is. And if you look at it from a products perspective it looks basically random. So now you are throwing in a join to a table that is only 77 rows and where the sorting does not really match up.

I believe that is why you see the Merge Join for the first and the Hash for the second.

Read that link above as it gives a nice description of these things and how they work. Another good one is at this other SE article.

Lots of information in those two and at many other places (including your book).