Sql-server – SQLServer Inconsistency between changing values for columns vs variables

sql server

Two similar situations outlined below. The first being table columns where we run a series of calculations across columns a, b and c. We do the same thing using declared variables and the result is different to the answer for columns.

It appears that SQL Server persists the value of the table columns for the duration of the command whereas the values of the variables will change during the command.

In a case where you mix variables and table columns the results are still different – but consistent to the descriptions above.

In other RDBMS's that we use there is no difference between these two. One poignant example of where these behave the same is Sybase.

So why is there a difference? Is there something that can alter this behaviour and make it consistent?

/* Column version */

create table #CalcTest (
  a int not null,
  b int not null,
  c int not null
)

insert into #CalcTest ( a, b, c )
values ( 4, 14, 24 )

update #CalcTest
set  b = a + c,
     a = 5,
     c = b - a

select a, b, c from #CalcTest

--output 5 28 10
--Note further - if you break up the update command into 3 commands c = 23

/* Variable version */

declare @a int, @b int, @c int

select @a = 4, @b = 14, @c = 24

select    @b = @a + @c,
          @a = 5,
          @c = @b - @a

select @a, @b, @c

--output 5 28 23
--Note c is 23 where it was previously 10!

Best Answer

It is just how SQL Server works. More specifically, how its UPDATE statement works.

One SELECT statement with variables in your example is equivalent to three separate SET statements.

SET @b = @a + @c;
SET @a = 5;
SET @c = @b - @a;

The calculations here are sequential. The order in which you write these lines is important as in traditional programming language.

UPDATE statement is not sequential in this sense. It reads current values of all columns of a row into memory (internal variables behind the scene) and uses these cached values in calculations. These cached values are not changed. The final result is written back onto disk in one go, not column by column.

I think, the most vivid example would be an example how you can swap values of two columns with this statement:

update #CalcTest
set  b = a,
     a = b

In traditional procedural languages you have to use third variable to store the value:

temp = a;
a = b;
b = temp;

UPDATE in SQL Server has this variable implicitly. I don't know how other DBMSs implement UPDATE, but at least Postgres and Oracle seem to work in the same way as SQL Server. While MySQL not.

EDIT:

Before I mark this as the correct answer I will clarify and say that it's not so much how updates work but that it is just how columns work and, no, we can't change or control it.

My interpretation from this and other research elsewhere is this:

Columns are updated atomically in SqlServer. They get their new values all at the same time at the conclusion of the statement. Hence no partial updates or updates of one and not the other. All or nothing.
Variables (even in the same statement as columns) are not updated atomically. They get their new value part way through the statement one after another and their changes can feed into each other as they go. This is just like most programming languages.
Also, SqlServer will not guarantee the order in which variables are updated in a Select (and I suspect the same is true of Updates). This is probably due to the optimiser and how it schedules the cost of complex instructions like sub-selects. Simple stuff like that presented here has never (as far as I can see) been processed in anything other that the order it's written in. Just don't rely on that order.
Microsoft recommends using the Set command to set variable values where that statement does not refer to tables or views, i.e. like those presented here. However, you can't chain up variables in a Set like you can in a Select. Also, Microsoft say this and then go right ahead and continue using Select to set variables in their examples. :-)

Related Solutions

Sql-server – When converting a table valued function to inline, why do I get a lazy spool

A "lazy" spool, is not really lazy in the "bad" sense of the word. It's lazy in that it may remove the need to re-scan the input data for the given operation, resulting most of the time, in improved performance.

From this MSDN page:

The Lazy Spool logical operator stores each row from its input in a hidden temporary object stored in the tempdb database. If the operator is rewound (for example, by a Nested Loops operator) but no rebinding is needed, the spooled data is used instead of rescanning the input. If rebinding is needed, the spooled data is discarded and the spool object is rebuilt by rescanning the (rebound) input.

The Lazy Spool operator builds its spool file in a "lazy" manner, that is, each time the spool's parent operator asks for a row, the spool operator gets a row from its input operator and stores it in the spool, rather than consuming all rows at once.

You mention in your question that performance has increased by converting the function into an inline variant; I'd say that's winning. Looking at the largest cost operator is not always the best way to optimize your code; for instance that would never inform you that moving from a multi-statement-TVF to an inline-TVF would be so performance enhancing. If you want more specific performance improvements for the code in question, adding that code to your question will help us help you.

Sql-server – How to update ID=null values in table to incremental counter values

You can do this.

WITH T
     AS (SELECT ISNULL((SELECT MAX(ID) FROM StagingTable), 0) + 
                    ROW_NUMBER() OVER (ORDER BY NaturalID) AS New_ID,
                ID
         FROM   StagingTable
         WHERE  ID IS NULL)
UPDATE T
SET    ID = New_ID

So the windowed function is used in the SELECT list but you can still use the result of it to UPDATE the column.

You should probably have a filtered index unique constraint on ID WHERE ID IS NOT NULL to prevent duplicates too. Or run this at serializable isolation level to block concurrent inserts.

Best Answer

Related Solutions

Sql-server – When converting a table valued function to inline, why do I get a lazy spool

Sql-server – How to update ID=null values in table to incremental counter values

Related Question