Sql-server – How does JOIN involving database across 2 servers store the intermediate data

sql servert-sql

In case of a SQL JOIN on 2 databases across servers.

For example:

Select *
From server1.db1.dbo.tbl1 a
Join server2.db2.dbo.tbl2 b
On a.common_id=b.common_id
Where a.some_col = 1000

Suppose this query is run on server 2 via SSMS, then where does the query processing work? For example:

  1. Data for server1.db1.dbo.tbl1 needs fetching

  2. Then data for server2.db2.dbo.tbl2 needs fetching

  3. Then join to be performed.

  4. Then a filter

Where (on which server) does each of the above operation happen?

Best Answer

As Dan points out, it's dependent on the execution plan that gets generated, as the processing can occur on either end (the remote server or local server which executed the query). Usually you can determine which server it was actually processed on by seeing if there's a Remote Query vs a Remote Scan operation in your query's execution plan.

A Remote Query usually means the query is sent to the remote server to be processed, and the predicates will be performed remotely, resulting in less data being return back locally.

A Remote Scan generally results in all the data from the remote entities (e.g. the entire tables) to be returned to the local server first, then the query is processed, predicates are applied, etc, on the local server.

Evidently, Remote Query is usually a more efficient operator to have in your query's execution plan as opposed to a Remote Scan operator.

By the way, this is a great website with a quick description of the different query plan operators.

I've seen a lot of adhoc answers on query re-writes to try and fix plans with Remote Scan and force a Remote Query operation instead. I'm not positive on exactly what the SQL Engine is doing under the hood when it chooses one over the other, but I have consistently seen that when your query uses the primary key (and more likely resultantly the clustered index) field of the remote entities, as part of the predicates, it results in a Remote Query operation.