A client have a data warehouse server with a linked server set-up. The data warehouse server is SQL Server 2014, while the linked server is a SQL Server, but version unknown.
Those two servers was previously physically at the same location, but a while ago the linked server was moved across the Atlantic. The performance was drastically hurt.
While I do understand that the latency is affected, I did not expect the throughput to be hurt as much. I'm talking about 20 minutes before and 15 hours after the relocation.
Monitoring wait stats shows pretty much async_network_io only.
I've tried to research how a query is processed across linked servers, but have not been able to find something useful. It appears to me that it's not as simple as the data warehouse sending a query to the linked server and the linked server starts to feed data back to the data warehouse. I do get the impression that it is continuous messages sent back and forth and therefor the latency plays a huge part.
The queries are simple selects from single tables.
Setting this up in SSIS improves performance to an acceptable level, but will take a lot of time to set up.
Anyone knows what's going on here?
UPDATE – Further details requested from David S
A typical table that is selected from the linked server and inserted to the local server is ~5 000 000 rows @ 4 GB
The query runs on a view:
CREATE VIEW [dbo].[Order_line]
AS
SELECT [ordno]
,[divcode]
,dbo.UTCOFFSET([regdate]) as regdate
...
,[o08rqtyb]
,[marowver]
FROM [LinkedServer].[Db1].[dbo].[O08T1]
The query to insert and update rows looks like this (Contained in a SP)
UPDATE
T1
SET
T1.[ordno] = T2.[ordno],
T1.[divcode] = T2.[divcode],
...
T1.[marowver] = T2.[marowver],
T1.[updateTime] = GETDATE()
FROM
[t_Order_line] T1
JOIN [Order_line] T2 ON
T1.shorto08 = T2.shorto08 and
T1.marowver < T2.marowver
INSERT INTO [ASTRO_Auburn].[dbo].[t_Order_line]
([ordno]
,[divcode]
...
,[marowver]
,[insertTime])
SELECT T1.[ordno]
,T1.[divcode]
...
,T1.[marowver]
,GETDATE()
FROM [Order_line] T1 WITH (NOLOCK)
LEFT JOIN [t_Order_line] T2 ON
T1.shorto08 = t2.shorto08
WHERE t2.[ordno] IS NULL
I know that there are questionable decisions taken when this was set-up, but it is the legacy from another time.
Fetching the query plan that is executed on the remote server only shows a Table scan. Possibly that scan is executed twice? But that's not the major issue here.
Best Answer
There are a number of details that you need to include that may have an effect here. Are they many small resultsets (a few rows or small rows) or small numbers of large results (many rows and/or large columns like BLOBs)? If it is a mix can you separate them out and test the performance separately?
How are you making
SELECTS
to the remote end? Direct inserts into a local table with something likeINSERT localtable SELECT stuff FROM server.db.schema.table
or are you reading the results into your own code for further processing beforeINSERT
? If the latter than you might need to look into which cursor options you (or the framework you are calling through) are using and see if there are less latency sensitive options that are applicable (assuming the framework you are using allows you this level of control). If the options used mean that each row is request separately then latency will be far more of an issue then if they allow the rows to be read over the network in bulk and navigated locally.If you are using exactly the same statements to retrieve the data in SSIS as your previous method then that could imply one of three things, or a mix of them: