Sql-server – How to query data from a linked server, and pass it parameters to filter by

linked-serverperformancequery-performancesql serversql-server-2005

I have a really big query that needs to be run on multiple databases, and the results appended to a temp table and returned.

The basic syntax looks something like this:

INSERT INTO #tmpTable (Id, ...)

SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
INNER JOIN #tmpIds as T ON T1.Id = T.Id

INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id

The query runs quickly if run locally on the the individual servers, however it takes a long time to run if it's run from a linked server using the 4-part names like above.

The problem appears to be it's querying the linked server for the unfiltered result set first, then joining it to the #tmpIds table on the local server afterwards, which makes the query take an very long time to run.

If I hardcode the Ids to filter the result set on the linked server, such as

SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
-- INNER JOIN #tmpIds as T ON T1.Id = T.Id
INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id

WHERE T1.Id IN (1, 2, 3)

it runs quickly in just a few seconds.

Is there a way to run this query so it filters the result set of the query from the linked server by the #tmpId table first, before returning the result set to the local server?

Some things to note

  • The query is very large and complex, and Dynamic SQL is not a viable option for it due to the maintenance nightmare that causes.

    I would be open to suggestions about how to use Dynamic SQL for something else though, such as running a stored procedure or UDF if there's a way to do that over a linked server (tried a few different methods such as sp_executeSQL, OPENROWSET, and OPENQUERY, but those all failed).

  • Because it uses the 4-part naming convention, I cannot use a UDF on the remote server
  • Distributed Transactions are disabled, so the following does not work

    INSERT INTO #table 
    EXEC Server.Database.dbo.StoredProcedure @ids
    

Best Answer

The performance problem actually had to do with the LEFT OUTER JOIN tables. If I changed them to INNER JOIN, or if I excluded their data from the SELECT columns, the query ran fine.

What I ended up doing was creating a View on the linked server containing all the data I wanted from it, then simply joining to it from the primary server with the #tmpIds table.

I didn't think this would work since I thought joining everything and pulling it down to the second server before filtering was the same as what I was doing now, and would lead to the same performance problem, but surprisingly that doesn't appear to be the case.

CREATE VIEW MyView 
AS

SELECT T1.Id, T2.ColA, ...
FROM Table1 as T1
INNER JOIN Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Table6 as T6 ON T5.Id = T6.Id

GO

and

INSERT INTO #tmpTable (Id, ...)

SELECT T1.Id, T1.ColA, ...
FROM Server.Database.dbo.MyView as T1
INNER JOIN #tmpIds as T ON T1.Id = T.Id

All the joined columns were correctly indexed, however according to this answer

Even though there may be indexing on tables on the remote server, SQL may not be able to take advantage of them while it can build a local query plan that does take advantages of indexing.

And this one

Let the linked server do as much as possible.
It is impossible for SQL Server to optimise a query on a linked server, even another SQL Server

so I am guessing that the query plan used for the query was not using the Indices defined, and SQL Server was generating a poor query plan for the LEFT OUTER JOIN tables.