Sql-server – Which timeout(s) are causing the linked server query to time out

linked-serverquery-timeoutsql server

We have two databases on linked SQL Servers that are partially synchronized. Occasionally, one side goes down for an upgrade, then the other goes down for an upgrade, and then we re-synchronize.

Upgrades can take upwards of two days. Re-synchronizing two days worth of data is a large operation, so we broke it into batches. During testing, we completed twenty batches successfully before this message appeared in the query window in SSMS:

OLE DB provider "SQLNCLI11" for linked server "MyLinkedServer" returned message "Query timeout expired".

We restored, tested it again, and the operation was successful. We tested on a slower pair of machines, and it timed out on the first batch. We concluded that environmental noise (other processes on the server) and random variance is causing our query to occasionally take too long.

Here is the query that is timing out:

MERGE INTO dbo.MyTable AS target
USING (
    SELECT localData.ID, remoteData.Name AS Name, remoteData.Value AS Value
    FROM OPENQUERY (
        MyLinkedServer,'
        SELECT TOP 500000 Name, Value
        FROM MyDatabase.dbo.MyTable
        WHERE Synced = 0'
    ) AS remoteData
    JOIN dbo.MyTable AS localData ON localData.Name = remoteData.Name
    WHERE localData.Synced = 0
) AS source
ON source.ID = target.ID
-- We need to pull this record
WHEN NOT MATCHED BY target THEN 
    INSERT (
        Name, Value, Synced
    ) VALUES (
        source.Name, source.Value, 1
    )
-- We already have this record
WHEN MATCHED THEN
    UPDATE SET target.Synced = 1
-- There is an OUTPUT clause here that collects the remote IDs of all of 
-- the rows that were synced and stores them in a table. We update the
-- remote table before we do the next batch, so that we don't sync the same
-- 500000 rows over and over again.
;

The query above represents one batch. It replaces batchablequery below. We separated it out like this because we sync more than one table.

DECLARE @count INT = 500000
WHILE @count > 0
BEGIN
    batchablequery
    SET @count = @@ROWCOUNT

    INSERT INTO MyLogTable 
    VALUES(GETDATE(), 'We finished a batch of size ' + CONVERT(varchar(20), @count))

    CHECKPOINT
END

We could fix the query timeout in one of three ways:

  • Improve query performance
    • It spends most of its time waiting with a PAGEIOLATCH_SH
  • Decrease or throttle batch size
  • Increase timeout

Before we choose a solution, we want to fully understand which timeouts are affecting us. We are confused by which ones are applicable to our query. Here are our current settings, and some notes we took from MSDN:

  • Server Remote Query Timeout
    • Ours is set to 600
  • Linked Server Connection Timeout
    • Ours is set 0, so sp_configure Remote Query Timeout Option is used instead.
  • Linked Server Query Timeout
    • Ours is set to 0, so sp_configure Query Wait option is used instead.
  • Query Wait
    • Ours is set to -1, so 25 times the estimated query cost is used.
    • The estimated query cost doesn't have a unit of measure on it…

Which setting or settings are the cause of our timeout?

Best Answer

You could try putting the remoteData into a temp table before doing the merge. a lot of your time will be taken up with comparisons of data not the actual retrieval so if you run the

SELECT remoteData.Name AS Name, remoteData.Value AS Value
into #remoteData
    FROM OPENQUERY (
        MyLinkedServer,'
        SELECT TOP 500000 Name, Value
        FROM MyDatabase.dbo.MyTable
        WHERE Synced = 0'
    )

then in the merge statement

MERGE INTO dbo.MyTable AS target
USING (
    SELECT localData.ID, remoteData.Name AS Name, remoteData.Value AS Value
    FROM #remoteData as remoteData
    JOIN dbo.MyTable AS localData ON localData.Name = remoteData.Name
    WHERE localData.Synced = 0
) AS source

It may help (can't promise)