Sql-server – Execution Plan Basics — Hash Match Confusion

execution-plansql serversql-server-2008-r2

I am starting to learn execution plans and am confused about how exactly a hash match works and why it would be used in a simple join:

select Posts.Title, Users.DisplayName
From Posts JOIN Users on
Posts.OwnerUserId = Users.Id
OPTION (MAXDOP 1)

enter image description here

As I understand it the results of the Top index scan become the hash able and each row in the bottom Index clustered scan is looked up. I understand how hash tables work to at least some degree, but I am confused about which values exactly get hashed in an example like this.

What would make sense me is the the common field between them, the id, is hashed — but if this is the case, why hash a number?

Best Answer

As SQLRockstar's answer quotes

best for large, unsorted inputs.

Now,

  • from the Users.DisplayName index scan (assumed nonclustered) you get Users.Id (assuming clustered) = unsorted
  • You are also scanning Posts for OwnerUserId = unsorted

This is 2 unordered inputs.

I'd consider an index on the Posts table on OwnerUserId, including Title. This will add some order on one side of the input to the JOIN + it will be covering index

CREATE INDEX IX_OwnerUserId ON Posts (OwnerUserId) INCLUDE (Title)

You may then find that the Users.DisplayName index won't be used and it will scan the PK instead.