I have a scalar function that returns a big XML, created over a bunch of invoices.
The exact list of invoices to feed to the function can be calculated using several different ways, but the function is same every time. For this reason, I declared a user-defined table type to contain primary keys from the eInvoice.Header
table and pass it to the function. This way I can have several different functions to decide which invoices to process, and just one function that actually generates XML:
create function eInvoice.GetRelevantLinesInOneWay()
returns table ...
create function eInvoice.GetRelevantLinesInAnotherWay()
returns table ...
create function eInvoice.GetXML(@lines eInvoice.InvoicePrimaryKeys readonly)
returns xml
as
begin
declare @x xml;
with xmlnamespaces(N'important namespace' as pro)
select @x = (
select
...
from
eInvoice.Header h
inner join @lines l on h.ST_PRIMARY = l.invoice_row_id
for xml path(N'pro:Import'), type
);
return @x;
end;
Unfortunately this setup has proven to be very fragile.
Typically @lines
contains about 150 rows (out of about 1m
in eInvoce.Header
). The correct execution plan is to use index seek on ST_PRIMARY
, and that is what always happens when I execute the body of eInvoice.GetXML
as an ad-hoc query.
However when I store it as a function, it works as expected for some little time, then something happens (too many rows in @lines
, like about 300
?), and it decides to change the execution plan to a full scan on eInvoice.Header
and to keep it that way.
With the seek plan the function executes instantly, with the scan plan I don't know how much time it takes, I waited for 30 minutes and cancelled it.
I've tried various ways of forcing the seek plan on the function.
- Adding
with (forceseek)
aftereInvoce.Header
seems to work when I click "Display estimated execution plan," but when I actually execute it, the hint is ignored and a scan is performed. - Adding
option (recompile)
inside the function or outside, in the calling code, does not seem to have an effect. - Capturing the execution plan and hardcoding it inside the function with
option (use plan N'<plan>')
does not work because@lines
contains a primary key on its only column, and the name of that index is different for each instance of@lines
, but the execution plan has to refer to the index by fixed name. - I tried deleting cached execution plans and recompiling the function several times. It sometimes helps for a very limited time and sometimes has no effect.
Is there a way to force the index seek?
Best Answer
Summary
The question does not provide execution plans or a full reproduction script, but based on the information given, you should use a
FAST 1
hint. If you can, you should also consider converting the scalar function to the inline table-valued type.AdventureWorks-based examples
Download the Microsoft sample database here.
Table type
Scalar function
You could try various combinations of
INNER LOOP JOIN
(with the written join order reversed, andFORCE ORDER
added to suppress the warning),FORCESEEK(index_name(columns))
(without schemabinding) and evenQUERYTRACEON(8690)
to prevent a performance spool; but in my experience,FAST 1
is the most reliable way to get the right plan, and simpler as well.In addition to all the usual reasons to avoid scalar functions, they do make plan analysis harder. First, you need to capture actual plans using Profiler or Extended Events - they do not appear in query tools like SSMS. Second, not all hints work as you would expect, for example
RECOMPILE
due to scoping and the separate plans.The code above reliably produces the desired plan for me regardless of the number of rows in the table variable.
Without the
FAST 1
hint in the function, the optimizer chooses a merge join and scan of both tables. This plan is cached and reused for a later function execution whereTOP (300)
is changed toTOP (1)
, reproducing the core of the issue.Inline table-valued function
Unless there are absolutely compelling reasons to stick with the scalar function, you should probably rewrite it to inline table-valued form as follows:
The
TOP (1)
in the cross apply prevents the optimizer choosing anything other than nested loops join. TheFORCESEEK
hint is not required; I have included it to illustrate how it could be used if necessary.The actual execution plan (without using Profiler/XE) is: