Sql-server – Select from linked server to local server causing waits on both sides

linked-serversql serverwait-types

The queries in question are here: http://pastie.org/4833351

Run individually, each of these queries runs through (albeit slow).

Run as a group, it just sits there, showing either OLEDB waits on the local box or in a SUSPENDED state on the linked server (wait type: ASYNC_NETWORK_IO).

The query finished (after 44 mins) with the following errors:

OLE DB provider "SQLNCLI10" for linked server "[MyServer]" returned message "Protocol error in TDS stream".

OLE DB provider "SQLNCLI10" for linked server "MyServer" returned message "Query timeout expired".

Msg 7330, Level 16, State 2, Line 127 Cannot fetch a row from OLE DB provider "SQLNCLI10" for linked server "MyServer".

Can anyone here point me in the direction of how I can both diagnose and fix this?

Best Answer

Pulling data through a linked server is unlikely to be quick especially if you are joining remote tables to local tables. (Just use Profiler to watch what your local server sends to the remote server and I think that you'll be convinced.) Linked Servers are convenient and often "good enough", but not when you start blowing through your batch processing windows.

I have seen overall processing go faster by pulling the remote data into local (temporary or permanent) tables and then "doing the joins" locally to figure out what to put into the production tables.

If you have a large amount of data, I suggest looking into using a SSIS package or bcp and probably bringing the data into a local staging table first and converting the code to run off of local tables. This might involve a lot of surgery to your jobs since you need to have a job step (or steps) to run bcp code and/or packages.

If you are using SQL 2008 or better, MERGE should be available and it could provide more efficient operations. I am unsure if MERGE works with tables on a linked server and, if it does, performance may not be much better than what you have. So, you are still stuck getting the data onto the local server.

Also, you could look into partition switching, but I think would be the most costly thing to do in terms of time spent on getting things to work. You could view it as a learning experience.

Alternatively, you might consider increasing the query timeout, which is a server-level setting on the local server. That might help with the timeout error (until your data volume increases, then you would need to tweak it again), but I doubt that it will help with the TDS error problem.

If you want to minimize the length of time that your production tables are locked (due to the INSERT, not the SELECT with nolock), you would want to get all of the data from the ETL system onto the local box first, then go and insert it all. You want to avoid pulling data for one table, inserting that, pulling the data for the next table, inserting that, etc. (I'm presuming that you are doing all of this in a declared transaction that I'm not seeing.)