You can turn on trace flag 7300 which might give you a more detailed error message
How many rows does a representative query return? How fast/reliable is the network connection between the two servers?
It's possible that a large dataset is taking too long to transfer (on top of the actual query time). You could raise the timeout value.
You can try to reconfigure the timeout setting as follows:
Set the remote login timeout to 300 seconds:
sp_configure 'remote login timeout', 300
go
reconfigure with override
go
Set the remote query timeout to 0 (infinite wait):
sp_configure 'remote query timeout', 0
go
reconfigure with override
go
Update: SQL Server 2012 SP1 onwards: users with SELECT
permission will be able to access DBCC SHOW_STATISTICS
which will improve read only performance on linked servers. Ref: https://msdn.microsoft.com/en-us/library/ms174384(v=sql.110).aspx
Update: You are correct in saying it's not the size of data or connection speed. It rang a bell in my foggy memory and I remembered where I had seen it: Slow in the Application, Fast in SSMS?(An Issue With Linked Servers). It's not parameter sniffing, it is the statistics themselves that are missing (due to permissions), causing a bad query plan to be used:
You can see that the estimates are different. When I ran as sysadmin,
the estimate was 1 row, which is a correct number, since there are no
Orders in Northwind where the order ID exceeds 20000. But when I ran
as a plain user, the estimate was 249 rows. We recognize this
particular number as 30 % of 830 orders, or the estimate for an
inequality operation when the optimizer has no information.
Previously, this was due to an unknown variable value, but in this
case there is no variable that can be unknown. No, it is the
statistics themselves that are missing.
As long as a query accesses only tables in the local server, the
optimizer can always access the statistics for all tables in the
query; there are no extra permission checks. But this is different
with tables on a linked server. When SQL Server accesses a linked
server, there is no secret protocol that is only used for inter-server
communication. No, instead SQL Server uses the standard OLE DB
interface for linked servers, be other SQL Server instances, Oracle,
text files or your home-brewed data source, and connects just like any
other user. Exactly how statistics is retrieved depends on the data
source and the OLE DB provider in question. In this case, the provider
is SQL Server Native Client which retrieves the statistics in two
steps. (You can see this by running Profiler against the remote
server). First the provider runs the procedure
sp_table_statistics2_rowset which returns information about which
column statistics there are, as well as their cardinality and their
density information. In the second step, the provider runs DBCC
SHOW_STATISTICS, a command that returns the full distribution
statistics. (We will look closer at this command later in this
article.) Here is the catch: to run DBCC SHOW_STATISTICS, you must be
member of the server role sysadmin or any of the database roles
db_owner or db_ddladmin.
And this is why I got different results. When running as sysadmin I
got the full distribution statistics which indicated that there are no
rows with order ID > 20000, and the estimate was one row. (Recall that
the optimizer never assumes zero rows from statistics.) But when
running as the plain user, DBCC SHOW_STATISTICS failed with a
permission error. This error was not propagated, but instead the
optimizer accepted that there were no statistics and used default
assumptions. Since it did get cardinality information, it learnt that
the remote table has 830 rows, whence the estimate of 249 rows.
Whenever you encounter a performance problem where a query that
includes access to a linked server is slow in the application, but it
runs fast when you test it from SSMS, you should always investigate if
insufficient permissions on the remote database could be the cause.
(Keep in mind that the access to the linked server may not be overt in
the query, but could be hidden in a view.) If you determine that
permissions on the remote database is the problem, what actions could
you take?
You can add the users to the role db_ddladmin, but since this gives them right to add and drop tables, this is not recommendable.
By default, when a users connect to a remote server they connect as themselves, but you can set up a login mapping with
sp_addlinkedsrvlogin, so that users map to a proxy account that has
membership in db_ddladmin. Note that this proxy account must be an SQL
login, so this is not an option if the remote server does not have SQL
authentication enabled. This solution too is somewhat dubious from a
security perspective, although its better the previous suggestion.
In some cases you can rewrite the query with OPENQUERY to force evaluation on the remote server. This can be particularly useful, if
the query includes several remote tables. (But it can also backfire,
because the optimizer now gets even less statistics information from
the remote server.)
You could of course use the full battery of hints and plan guides to get the plan you want.
Finally, you should ask yourself whether that linked-server access is needed. Maybe the databases could be on the same server? Could data
be replicated? Some other solution?
The sys.dm_exec_query_plan has a column returned for dbid, you could just filter on that for the database that is being mirrored.
Or, you could get fancy and nest a sub-query that filters for any dbid that is in a mirrored state by going out to the sys.database_mirroring DMV.
Best Answer
At the risk of stating the obvious, first decide what you are optimizing for. Almost always it is elapsed time, though you may be trying to reduce CPU use (to reduce license fees) or IO (to avoid a disk upgrade, say). Then find a way to measure that thing, holding other variables constant. Be sure your chosen measure actually measures what you target and not a bunch of other stuff too.
For example, running a query in SSMS and looking at the status bar execution time will show round-trip network time plus plan compilation time plus IO time plus query execution time plus client caching time plus SSMS rendering time. For large result sets and slow networks the overhead and variability may swamp any actual improvement.
Having a completely clean bufferpool and plan cache for each itteration will give great reproducability. It is unlikely to be the actual production scenario, however. Since production is what you're trying to make fast then I'd suggest you test in a production-like environment, with similar memory & disk, plans compiled and the working set in the bufferpool.