I have a really big query that needs to be run on multiple databases, and the results appended to a temp table and returned.
The basic syntax looks something like this:
INSERT INTO #tmpTable (Id, ...)
SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
INNER JOIN #tmpIds as T ON T1.Id = T.Id
INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id
The query runs quickly if run locally on the the individual servers, however it takes a long time to run if it's run from a linked server using the 4-part names like above.
The problem appears to be it's querying the linked server for the unfiltered result set first, then joining it to the #tmpIds
table on the local server afterwards, which makes the query take an very long time to run.
If I hardcode the Ids to filter the result set on the linked server, such as
SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
-- INNER JOIN #tmpIds as T ON T1.Id = T.Id
INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id
WHERE T1.Id IN (1, 2, 3)
it runs quickly in just a few seconds.
Is there a way to run this query so it filters the result set of the query from the linked server by the #tmpId
table first, before returning the result set to the local server?
Some things to note
-
The query is very large and complex, and Dynamic SQL is not a viable option for it due to the maintenance nightmare that causes.
I would be open to suggestions about how to use Dynamic SQL for something else though, such as running a stored procedure or UDF if there's a way to do that over a linked server (tried a few different methods such as
sp_executeSQL
,OPENROWSET
, andOPENQUERY
, but those all failed). - Because it uses the 4-part naming convention, I cannot use a UDF on the remote server
-
Distributed Transactions are disabled, so the following does not work
INSERT INTO #table EXEC Server.Database.dbo.StoredProcedure @ids
Best Answer
The performance problem actually had to do with the
LEFT OUTER JOIN
tables. If I changed them toINNER JOIN
, or if I excluded their data from theSELECT
columns, the query ran fine.What I ended up doing was creating a
View
on the linked server containing all the data I wanted from it, then simply joining to it from the primary server with the#tmpIds
table.I didn't think this would work since I thought joining everything and pulling it down to the second server before filtering was the same as what I was doing now, and would lead to the same performance problem, but surprisingly that doesn't appear to be the case.
and
All the joined columns were correctly indexed, however according to this answer
And this one
so I am guessing that the query plan used for the query was not using the Indices defined, and SQL Server was generating a poor query plan for the
LEFT OUTER JOIN
tables.