Sql-server – Logic of Assigning values to variables in Scalar-valued Functions

functionsoptimizationsql serversubquery

I am currently working on updating old scalar valued functions with the hope of making them more efficient or changing them into Inline TVFs.

While troubleshooting my rewrites I came across some oddities in the original code which do not seem to make sense based on my understanding of variables in SQL

If I were to write

 Declare @var as Integer
 Select @var = col1 from table

Then this would always return the last item in col1. However, in the function that I am working with, this does not seem to be the case. Initially, there was a where clause which was changing results based on the input, but, removing that, the above property should hold. For the most part, it does, however there are a couple of cases where this is not true.

The outline is roughly similar to the following:

create function f (
@var1 ... )
returns datatype
begin
    Declare @temp1 ... (several vars declared here)
    select @temp1 = col1, @temp2 = col2, ..., @tempX = max(colX) from(
        select col1, col2, ... ,
        colX * (case when name = 'X' then 1 else 0)
        from table1 
        INNER JOIN table2 on ...
        inner join table3 on ...
        left join table4 on ...
        where x) as sq
    group by ...
    return ...
end)

While more convoluted than the basic example above, with the where clause removed, it seems that it should be logically equivalent, however, If i run the function and look at what it's actually returning, it is not always the same value. Please note, that for testing purposes, the function is only returning one of the variables assigned through the select. Which, assuming similar functionality to the simple case given above, should always return the same value.

The values that are passed in as parameters should not affect this specific statement (this is a subset of the whole function, but I've created a smaller function to test this part of the code and decided to keep the same values passed in for consistency's sake). Only the newly declared variables are used and given values; excluding the where clause which I have commented out during my testing.

Then, my question is, how would these tables change in order to change the last element if the joins are always going to be the same and there is no where clause to change the values in the joins? grouping should be consistent as well since all of the tables should be the same every time the function is run.

The subquery should always return the same table and then the variables should be assigned the last value in their specific column, meaning that the return for the function should always be the same value.

Best Answer

Then this would always return the last item in col1.

What does "last item" mean? In SQL Server, and indeed most if not all RDBMSs, there is no concept of "last" unless you specify an ORDER BY clause.

The apparent order of results can change drastically if the plan changes due to things like parameter sniffing, updated statistics, new or modified indexes, etc., etc.

Never1 rely on observed order behavior without an explicit ORDER BY.

This question and the attached answers make for interesting reading on the topic, as well.


Yes, I did just say never. This instance is an exception to the rule "never say never."