Sql-server – SQL Server INNER REMOTE JOIN returns more rows than INNER JOIN

hintssql serversql-server-2008-r2

I'm trying to join a few rows from a remote view to a local table. The view has about 300 million rows, so I want to use the REMOTE query hint so that all 3 million rows don't have to be transferred to my computer.

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
    SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
    ) R
ON L.ID = R.L_ID

This returns 100 rows, as I expected, and takes basically no time, as I expected.

However,

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER REMOTE JOIN (
    SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
    ) R
ON L.ID = R.L_ID

starts to return thousands of rows. I quit it after a few seconds, but it was in the tens – hundreds of thousands.

How could a query hint change my result set?

Best Answer

TOP 100 with no ORDER BY means it is undeterministic which 100 rows from the remote table end up participating in the join. This is execution plan dependant and can vary.

If it is a one to many relationship it may be the case that one batch of 100 rows has more matches on the other side of the join than another different batch of 100 rows.

You should specify an ORDER BY (inside the derived table) on some unique column or combination of columns to ensure deterministic results.