Sql-server – SQLServer Inconsistency between changing values for columns vs variables

sql server

Two similar situations outlined below. The first being table columns where we run a series of calculations across columns a, b and c. We do the same thing using declared variables and the result is different to the answer for columns.

It appears that SQL Server persists the value of the table columns for the duration of the command whereas the values of the variables will change during the command.

In a case where you mix variables and table columns the results are still different – but consistent to the descriptions above.

In other RDBMS's that we use there is no difference between these two. One poignant example of where these behave the same is Sybase.

So why is there a difference? Is there something that can alter this behaviour and make it consistent?

/* Column version */

create table #CalcTest (
  a int not null,
  b int not null,
  c int not null
)

insert into #CalcTest ( a, b, c )
values ( 4, 14, 24 )

update #CalcTest
set  b = a + c,
     a = 5,
     c = b - a

select a, b, c from #CalcTest

--output 5 28 10
--Note further - if you break up the update command into 3 commands c = 23

/* Variable version */

declare @a int, @b int, @c int

select @a = 4, @b = 14, @c = 24

select    @b = @a + @c,
          @a = 5,
          @c = @b - @a

select @a, @b, @c

--output 5 28 23
--Note c is 23 where it was previously 10!

Best Answer

It is just how SQL Server works. More specifically, how its UPDATE statement works.

One SELECT statement with variables in your example is equivalent to three separate SET statements.

SET @b = @a + @c;
SET @a = 5;
SET @c = @b - @a;

The calculations here are sequential. The order in which you write these lines is important as in traditional programming language.

UPDATE statement is not sequential in this sense. It reads current values of all columns of a row into memory (internal variables behind the scene) and uses these cached values in calculations. These cached values are not changed. The final result is written back onto disk in one go, not column by column.

I think, the most vivid example would be an example how you can swap values of two columns with this statement:

update #CalcTest
set  b = a,
     a = b

In traditional procedural languages you have to use third variable to store the value:

temp = a;
a = b;
b = temp;

UPDATE in SQL Server has this variable implicitly. I don't know how other DBMSs implement UPDATE, but at least Postgres and Oracle seem to work in the same way as SQL Server. While MySQL not.


EDIT:

Before I mark this as the correct answer I will clarify and say that it's not so much how updates work but that it is just how columns work and, no, we can't change or control it.

My interpretation from this and other research elsewhere is this:

  • Columns are updated atomically in SqlServer. They get their new values all at the same time at the conclusion of the statement. Hence no partial updates or updates of one and not the other. All or nothing.

  • Variables (even in the same statement as columns) are not updated atomically. They get their new value part way through the statement one after another and their changes can feed into each other as they go. This is just like most programming languages.

  • Also, SqlServer will not guarantee the order in which variables are updated in a Select (and I suspect the same is true of Updates). This is probably due to the optimiser and how it schedules the cost of complex instructions like sub-selects. Simple stuff like that presented here has never (as far as I can see) been processed in anything other that the order it's written in. Just don't rely on that order.

  • Microsoft recommends using the Set command to set variable values where that statement does not refer to tables or views, i.e. like those presented here. However, you can't chain up variables in a Set like you can in a Select. Also, Microsoft say this and then go right ahead and continue using Select to set variables in their examples. :-)