SQL Server – How to Optimize Remote Scan on Linked Server

azure-sql-databasesql server

I'm trying to figure it out how to optimize/improve the following thing, on a VM hosted in MS Azure and a database, also hosted in MS Azure. Besides the instance hosted in MS Azure, there is another instance (local) that has a dedicated database for an integration made between two software applications and there is a linked server between the two instances.

Lately, I noticed that the remote scan is taking up to 100% causing poor performance on some stored procedures that updates the database hosted in Azure.

Here's how the execution plan looks like:

enter image description here

Part of the stored procedure executed against the database from local instance that do some updates on the linked server, looks like this:

UPDATE docl
SET docl.udfvalue1=WMSC.QtyReceptioned,
    docl.udfvalue2=WMSC.QtySent,
    docl.Stringfield1=WMSC.WAREHOUSE
FROM  MyLinkedServer.DataBase.dbo.DocLines docl
INNER JOIN #WMS_Confirmed WMSC on docl.gid=WMSC.GID

UPDATE es 
SET es.ADFLAG3 = 1
FROM MyLinkedServer.Database.dbo.DocHeader es
WHERE es.OrderDate>'2018-01-01' -- OrderDate is one of the indexes on `DocHeader` table
AND EXISTS (SELECT 1
              FROM #WMS_Confirmed WMSC 
              WHERE es.gid = WMSC.fDocGID
             );

What I found online, like enabling Dynamic Parameters and NestedQueries on linked server provider or OPTION (RECOMPILE), didn't helped at all. Is there something else that I am missing? Thanks

Best Answer

Based on my personal experiences, you will get far better performance by transferring data between the two servers and performing the update on the target server.

After watching one of these at a more detailed level (i.e. amount of data transferred between the servers), I determined that at least 90% of the remote table is pulled over to the local server before comparisons are run. This may be improved by indexing your temp tables correctly, but no guarantees.

The reason is simple: The local server doesn't understand the distribution of the remote server. It doesn't know the file structure, indexing, or any such information, so it assumes that it will have to do a table scan to identify what needs to be changed (at least this is my assumption based on experience.) Manually transferring the data to the server to be updated, then running a process there to do the update is the only way I achieved reasonable increases in efficiency.

It is awkward, and I hope someone has a better answer for you, but there you have it - my personal success story for optimizing these types of cross-server queries between the cloud and a local server.