Sql-server – Query involving TVF very slow on SQL Server 2008

linked-serverset-returning-functionssql serversql server 2014sql-server-2008-r2

I have a query that runs in under 1 second on SQL Server 2014. A similar query takes over 12 minutes when run against a table on a linked server running SQL Server 2008. I don't know if it's because it's a linked server or because it's 2008. I found a blog post about TVFs confusing the query optimizer on 2008, but I'm not sure how to apply that info to my situation.

Here's the query on the local table:

select 'E10', InvoiceNum, MiscCode, Description, DocMiscAmt, Type
from Live.Erp.InvcMisc
where InvoiceNum in (
    select distinct ARInvoiceNum from dbo.UnifiedSalesJournal(2018, 5, default, default, default)
    where DB = 'E10'
)
and MiscCode not like '%SHI'
and MiscCode not like '%HAN'

plan

Here's the query on the linked table. The difference in the plan is obvious:

select 'E9', InvoiceNum, MiscCode, Description, DocMiscAmt, Type
from tmiedb.Epicor905.Dbo.InvcMisc
where InvoiceNum in (
    select distinct ARInvoiceNum from dbo.UnifiedSalesJournal(2018, 3, default, default, default)
    where DB = 'E9'
)
and MiscCode not like 'SH%'
and MiscCode not like 'HD%'
and MiscCode <> 'THDL'

plan

I got rid of the loop by rewriting it like this. This runs in about 1 second:

select 'E9', InvoiceNum, MiscCode, m.Description, DocMiscAmt, Type
from dbo.UnifiedSalesJournal(2018, 3, default, default, default) i
left join tmiedb.Epicor905.Dbo.InvcMisc m
on i.ARInvoiceNum = m.InvoiceNum

plan

But if I add a where clause, this takes 29 minutes.

The data is small. The TVF returns a few thousand rows. The local table contains 10K rows, and the linked table contains 55K rows.

The linked server contains historical data that will never change, so I'm considering importing the whole database into SQL Server 2014. But I'm hoping there's a less drastic solution.

Possibly related: Linked Server behavior. I'm seeing the behavior described in Case 3a. It's calling sp_execute thousands of times, once for each row returned by the TVF.

Best Answer

Before SQL Server 2012 SP1, the permissions needed by the linked server login to read remote statistics were:

In order to view the statistics object, the user must own the table or the user must be a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role.

From SQL Server 2012 SP1 onwards, the login only needs SELECT permission:

  • Users must have permissions on all columns in the statistics object
  • Users must have permission on all columns in a filter condition (if one exists)
  • The table cannot have a row-level security policy.

More information in Distributed Queries – Remote Login Permissions and Execution Plans

Without access to statistical information, the optimizer makes a guess at the number of rows returned by the TVF (one row) and chooses nested loops on that basis.

If you cannot make the linked server login a member of one of the groups mentioned, the best workaround for the specific query in the question might be an OPTION (HASH JOIN) hint, or to read all the data from the remote server into a #temporary table.