Sql-server – When converting a table valued function to inline, why do I get a lazy spool

execution-plansql serversql-server-2005

I have a table valued function that I want to convert to inline to improve performance.
It sets the value of variables to the value four bit columns in one row of a table (looking up the row using a function parameter as the key), then runs one or more of four different insert statements (differing only by the WHERE clause) to populate a temp table based on which of the bit variables have a value of 1, and then returns the content of the temp table.

I rewrote it to be one SELECT (with a CTE to get the four bit column values) using four OR clauses in the WHERE to get the same result.

However, the plan cost jumped from .003 to over 2.5. On the other hand, the actual performance is better judging by the output from SET STATISTICS TIME ON. 75% of the cost of the inline version is for a lazy spool. Is there a good way to avoid the lazy spool and improve performance even more? The output could be anywhere from one row to thousands.

Best Answer

A "lazy" spool, is not really lazy in the "bad" sense of the word. It's lazy in that it may remove the need to re-scan the input data for the given operation, resulting most of the time, in improved performance.

From this MSDN page:

The Lazy Spool logical operator stores each row from its input in a hidden temporary object stored in the tempdb database. If the operator is rewound (for example, by a Nested Loops operator) but no rebinding is needed, the spooled data is used instead of rescanning the input. If rebinding is needed, the spooled data is discarded and the spool object is rebuilt by rescanning the (rebound) input.

The Lazy Spool operator builds its spool file in a "lazy" manner, that is, each time the spool's parent operator asks for a row, the spool operator gets a row from its input operator and stores it in the spool, rather than consuming all rows at once.

You mention in your question that performance has increased by converting the function into an inline variant; I'd say that's winning. Looking at the largest cost operator is not always the best way to optimize your code; for instance that would never inform you that moving from a multi-statement-TVF to an inline-TVF would be so performance enhancing. If you want more specific performance improvements for the code in question, adding that code to your question will help us help you.