Sql-server – Where do Linked Server Queries get executed

distributed-queriesdistributed-transactionslinked-serveropenrowsetsql server

I have two instances ServerA and ServerB, and I have created a linked server in ServerA for ServerB as Linksrv_B.

I can execute a query on ServerA using the four part naming convention:

SELECT * FROM Linksrv_B.master.sys.databases

or OPENQUERY():

SELECT * FROM OPENQUERY(Linksrv_B, 'SELECT * FROM master.sys.databases')  

I know linked server works across heterogeneous database using distributed transaction as a rowset.

  1. Where does the OLEDB provider gets initiated/connected? ServerA or ServerB?
  2. Where does the query gets executed on ServerA or ServerB? Does optimizer comes into play if so on which server?
  3. Where does the result set get cached?
  4. Could anyone give us a detailed explanation how linked server works.

Best Answer

1 - The OLEDB provider is initiated on Server A, Server B sees it as a normal connection and query, it doesn't know that Sever A is a SQL Server, it's just a client to it.

2 - The query is executed on both. Server A passes the query to Server B which executes it, then Server A works with the results. The optimizer comes into play on both, but can't talk to the other servers.

Given a query of no complexity the performance will be equivalent. However, when you start getting into complex queries or large datasets, you will see performance gains by fetching just what you need from Server B into a temp table and then using that with the data contained on Server A. (Assuming you need to mix and match data across servers). Essentially, with the OPENQUERY syntax, you pass the query as a whole to server B (hopefully including a where clause) which then returns data to Serer A. With the four part naming, you have no control and the optimizer rapidly will decide to just fetch all of the data in the table from Server B and then eliminate based on other criteria (joins, where, etc.).

3 - The result is not cached (except as SQL caches data normally). Unless you store the results from the linked server query into a temp table somewhere.

4 - If you are familiar with how applications connect to a database and execute queries then the process is very similar. Except for authentication differences it's the same, just that it allows for easily mixing and matching data from multiple servers into a single result set. But your performance will depend greatly on how much data is pulled from Server B and staying aware of the optimizer's limited ability to prevent you from pulling the entire table from Server B, even if you only ended up needing 5 of those rows.