Sql-server – Latency appear to be an issue with linked server

linked-serversql server 2014

A client have a data warehouse server with a linked server set-up. The data warehouse server is SQL Server 2014, while the linked server is a SQL Server, but version unknown.

Those two servers was previously physically at the same location, but a while ago the linked server was moved across the Atlantic. The performance was drastically hurt.

While I do understand that the latency is affected, I did not expect the throughput to be hurt as much. I'm talking about 20 minutes before and 15 hours after the relocation.

Monitoring wait stats shows pretty much async_network_io only.

I've tried to research how a query is processed across linked servers, but have not been able to find something useful. It appears to me that it's not as simple as the data warehouse sending a query to the linked server and the linked server starts to feed data back to the data warehouse. I do get the impression that it is continuous messages sent back and forth and therefor the latency plays a huge part.

The queries are simple selects from single tables.

Setting this up in SSIS improves performance to an acceptable level, but will take a lot of time to set up.

Anyone knows what's going on here?

UPDATE – Further details requested from David S

A typical table that is selected from the linked server and inserted to the local server is ~5 000 000 rows @ 4 GB

The query runs on a view:

CREATE VIEW [dbo].[Order_line]
AS
SELECT [ordno]
  ,[divcode]
  ,dbo.UTCOFFSET([regdate]) as regdate
  ...
  ,[o08rqtyb]
  ,[marowver]
FROM [LinkedServer].[Db1].[dbo].[O08T1]

The query to insert and update rows looks like this (Contained in a SP)

UPDATE
    T1
SET
    T1.[ordno] = T2.[ordno],
    T1.[divcode] = T2.[divcode],
    ...
    T1.[marowver] = T2.[marowver],
    T1.[updateTime] = GETDATE()
FROM
    [t_Order_line] T1 
JOIN [Order_line] T2 ON
    T1.shorto08 = T2.shorto08 and
    T1.marowver < T2.marowver

INSERT INTO [ASTRO_Auburn].[dbo].[t_Order_line]
       ([ordno]
       ,[divcode]
       ...
       ,[marowver]
       ,[insertTime])
SELECT T1.[ordno]
    ,T1.[divcode]
    ...
    ,T1.[marowver]
    ,GETDATE()
FROM [Order_line] T1 WITH (NOLOCK)
LEFT JOIN [t_Order_line] T2 ON
    T1.shorto08 = t2.shorto08
WHERE t2.[ordno] IS NULL

I know that there are questionable decisions taken when this was set-up, but it is the legacy from another time.

Fetching the query plan that is executed on the remote server only shows a Table scan. Possibly that scan is executed twice? But that's not the major issue here.

Best Answer

The queries are simple selects from single tables.

There are a number of details that you need to include that may have an effect here. Are they many small resultsets (a few rows or small rows) or small numbers of large results (many rows and/or large columns like BLOBs)? If it is a mix can you separate them out and test the performance separately?

How are you making SELECTS to the remote end? Direct inserts into a local table with something like INSERT localtable SELECT stuff FROM server.db.schema.table or are you reading the results into your own code for further processing before INSERT? If the latter than you might need to look into which cursor options you (or the framework you are calling through) are using and see if there are less latency sensitive options that are applicable (assuming the framework you are using allows you this level of control). If the options used mean that each row is request separately then latency will be far more of an issue then if they allow the rows to be read over the network in bulk and navigated locally.

Setting this up in SSIS improves performance to an acceptable level

If you are using exactly the same statements to retrieve the data in SSIS as your previous method then that could imply one of three things, or a mix of them:

  1. The cursor options as mentioned above (SSIS is more likely to be favour latency friendly options which may in other circumstances be sub-optimal, because it is more often used to work with more widely distributed servers)
  2. Connection pooling which your current method might not be making as much advantage of (again, here we need more details to do more than guess). SSIS will use the same connection rather than renegotiating for each query, your other process may not be doing this.
  3. Possibly concurrency: is your workflow in SSIS allowing it to submit several queries at the same time, so it can submit new queries to the remote end without first having to wait for the previous ones to complete, where the workflow in your current process is entirely sequential?