You can turn on trace flag 7300 which might give you a more detailed error message
How many rows does a representative query return? How fast/reliable is the network connection between the two servers?
It's possible that a large dataset is taking too long to transfer (on top of the actual query time). You could raise the timeout value.
You can try to reconfigure the timeout setting as follows:
Set the remote login timeout to 300 seconds:
sp_configure 'remote login timeout', 300
go
reconfigure with override
go
Set the remote query timeout to 0 (infinite wait):
sp_configure 'remote query timeout', 0
go
reconfigure with override
go
Update: SQL Server 2012 SP1 onwards: users with SELECT
permission will be able to access DBCC SHOW_STATISTICS
which will improve read only performance on linked servers. Ref: https://msdn.microsoft.com/en-us/library/ms174384(v=sql.110).aspx
Update: You are correct in saying it's not the size of data or connection speed. It rang a bell in my foggy memory and I remembered where I had seen it: Slow in the Application, Fast in SSMS?(An Issue With Linked Servers). It's not parameter sniffing, it is the statistics themselves that are missing (due to permissions), causing a bad query plan to be used:
You can see that the estimates are different. When I ran as sysadmin,
the estimate was 1 row, which is a correct number, since there are no
Orders in Northwind where the order ID exceeds 20000. But when I ran
as a plain user, the estimate was 249 rows. We recognize this
particular number as 30 % of 830 orders, or the estimate for an
inequality operation when the optimizer has no information.
Previously, this was due to an unknown variable value, but in this
case there is no variable that can be unknown. No, it is the
statistics themselves that are missing.
As long as a query accesses only tables in the local server, the
optimizer can always access the statistics for all tables in the
query; there are no extra permission checks. But this is different
with tables on a linked server. When SQL Server accesses a linked
server, there is no secret protocol that is only used for inter-server
communication. No, instead SQL Server uses the standard OLE DB
interface for linked servers, be other SQL Server instances, Oracle,
text files or your home-brewed data source, and connects just like any
other user. Exactly how statistics is retrieved depends on the data
source and the OLE DB provider in question. In this case, the provider
is SQL Server Native Client which retrieves the statistics in two
steps. (You can see this by running Profiler against the remote
server). First the provider runs the procedure
sp_table_statistics2_rowset which returns information about which
column statistics there are, as well as their cardinality and their
density information. In the second step, the provider runs DBCC
SHOW_STATISTICS, a command that returns the full distribution
statistics. (We will look closer at this command later in this
article.) Here is the catch: to run DBCC SHOW_STATISTICS, you must be
member of the server role sysadmin or any of the database roles
db_owner or db_ddladmin.
And this is why I got different results. When running as sysadmin I
got the full distribution statistics which indicated that there are no
rows with order ID > 20000, and the estimate was one row. (Recall that
the optimizer never assumes zero rows from statistics.) But when
running as the plain user, DBCC SHOW_STATISTICS failed with a
permission error. This error was not propagated, but instead the
optimizer accepted that there were no statistics and used default
assumptions. Since it did get cardinality information, it learnt that
the remote table has 830 rows, whence the estimate of 249 rows.
Whenever you encounter a performance problem where a query that
includes access to a linked server is slow in the application, but it
runs fast when you test it from SSMS, you should always investigate if
insufficient permissions on the remote database could be the cause.
(Keep in mind that the access to the linked server may not be overt in
the query, but could be hidden in a view.) If you determine that
permissions on the remote database is the problem, what actions could
you take?
You can add the users to the role db_ddladmin, but since this gives them right to add and drop tables, this is not recommendable.
By default, when a users connect to a remote server they connect as themselves, but you can set up a login mapping with
sp_addlinkedsrvlogin, so that users map to a proxy account that has
membership in db_ddladmin. Note that this proxy account must be an SQL
login, so this is not an option if the remote server does not have SQL
authentication enabled. This solution too is somewhat dubious from a
security perspective, although its better the previous suggestion.
In some cases you can rewrite the query with OPENQUERY to force evaluation on the remote server. This can be particularly useful, if
the query includes several remote tables. (But it can also backfire,
because the optimizer now gets even less statistics information from
the remote server.)
You could of course use the full battery of hints and plan guides to get the plan you want.
Finally, you should ask yourself whether that linked-server access is needed. Maybe the databases could be on the same server? Could data
be replicated? Some other solution?
This isn't a complete answer, as I admit I don't know the complete answer, but can't fit into a comment.
Short detour - regarding SQLOS memory there is a wealth of information coming from a guy that worked on the internals of SQL Server storage team - Slava Oks. He has lots of articles on MSDN blogs regarding the SQLOS memory manager and they are all worth a read. Now I know we're on to SQL Server 2012, but the info is valuable nonetheless.
Now, regarding your first question, if there's a limit of max memory taken by db page cache before being cleared out and how is it controlled, I'll quote his word on it.
From article SQLOS's memory manager and SQL Server's Buffer Pool:
Buffer Pool commits pages on demand. Depending on internal memory requirements and external memory state, it calculates its target, amount of memory it thinks it should commit before it can get into memory pressure. To keep system out of paging target is constantly recalculated. Target memory can't exceed max memory that represents max server memory settings. Even if you set min server memory equal to max server memory Buffer Pool will only commit its memory on demand. You can observe this behavior by monitoring corresponding profiler event.
From this article's comment I understand that SQL Server uses a LRU algorithm (LRU = least recently used). If you really need to go in depth, SQL Server's LRU algorithm in particular is described in the following research paper: The LRU-K Page Replacement Algorithm For Database Disk Buffering.
Now, for your second question, the log caches are not really exposed so they can't be changed/tuned, as they are a part of the SQLOS internal memory structure. Quoting from an answer from MSDN forum:
Essentially each database has a log buffer/cache for its transaction
log which is a small contiguous memory allocation, that is 60K in size
and used by active transactions, and flushed when a transaction
commits, or when the buffer space fills up. Multiple active
transactions can be interleaved in the log buffer, and if any one
transaction commits, the buffer memory is written to the transaction
log file by the log writer. If the space fills up, the contents get
flushed to disk allowing the buffer/cache memory to be reused. This
is why log IO can range for 512bytes to 64K (its actually 60K), but it
is always sequential so varying write sizes doesn't matter that much.
The format of the log records is different than data pages as
documented in the BOL
(http://msdn.microsoft.com/en-us/library/ms190925.aspx), so the size
of a log record is variable, but the size of the log buffer/cache is
consistent.
Best Answer
Here is an extended event session to find any
DBCC
calls, that have not already occured. From your question it looks like you may have wanted to find ones in the past? Like others have alluded to there should be info in the logs on cache clears, and should read as follows (at least on SQL2014):For the future you could use this Extended Event definition.
Will get you some information like this (my user and server name are redacted):
If you want a trace (SQL Profiler) definition I could probably throw one of those together too.
Hope this helps.
Cheers