Sql-server – Simple SQL CTE update

ctesql serverupdate

I am a little stumped with this CTE update stmt:

DECLARE @a TABLE (ID int, Value int);
DECLARE @b TABLE (ID int, Value int);
INSERT @a VALUES (1, 10), (2, 20);
INSERT @b VALUES (1, 100),(2, 200);

WITH cte AS 
(
    SELECT * FROM @a
)
UPDATE cte
SET    Value = b.Value
FROM   cte AS a
INNER JOIN @b AS b 
ON     b.ID = a.ID

SELECT * FROM @a
GO

Why does this result in table @a having 100 for both rows? I thought it should be 100 for ID 1 and 200 for ID 2.

I get expected results if I use a table instead of a common table expression to do the updates:

DECLARE @a TABLE (ID int, Value int);
DECLARE @b TABLE (ID int, Value int);
INSERT @a VALUES (1, 10), (2, 20);
INSERT @b VALUES (1, 100),(2, 200);

SELECT  *
FROM    @a

UPDATE @a
SET Value = b.Value
FROM @a AS XX
INNER JOIN @b AS b ON b.ID = xx.ID

SELECT  *
FROM    @a

This results in table @a with 100 and 200. But aren't we supposed to get both the same values? based on the previous explanation of referencing issue? — Update being done to @a table and not the referenced XX.

Best Answer

To expand on MguerraTorres' answer:

(Updated with the info from your secondary query)

In your first query UPDATE cte says to update the table from the CTE.

FROM cte as a says to refer to the table from the CTE as a.

So, we've referred to our CTE in two places.

What you may not realize is that a CTE is re-evaluated for each time it appears in your query, just as if you replaced the reference with a subquery. Since you've referenced the CTE two separate times, you've generated two separate resultsets for the DB engine to work with.

When you say to use b.Value where a.ID = b.ID, we've got two rows - one where b.Value is 100, and one where it's 200 - from table b and from our second CTE resultset.

However, we're updating the first CTE resultset based on these two rows. Therefore, it updates each row in that first resultset from the two rows returned. There's no relationship between the two resultsets, even though they represent the same underlying data. The engine is doing a CROSS JOIN between the results of your join, and the first resultset, to perform the update.

Your UPDATE statement updates both of your rows to to 200, then to 100 (because the engine decides the quickest way to apply the cross-joined rows, they may not go in the order in which they were entered). Both rows are updated to the same value because they're being updated from the same multiple rows.

Your first query is functionally identical to:

DECLARE @a TABLE (ID int, Value int);
DECLARE @b TABLE (ID int, Value int);
INSERT @a VALUES (1, 10), (2, 20);
INSERT @b VALUES (1, 100),(2, 200);


WITH cte AS 
(
    SELECT * FROM @a
)
UPDATE cte
SET    Value = b.Value
FROM   (SELECT * FROM @a) AS a
INNER JOIN @b AS b 
ON     b.ID = a.ID

SELECT * FROM @a
GO

In your second query, the DB engine knows that both a and @a reference a table outside the query, and it knows that a and @a mean the same thing, so it correctly ties the rows from @b to @a when performing the update.


In the comments, you asked:

Would the result be always be 100 for both? or can it be 200 for both sometimes -- As I see there is no clear rule here?

Whether it's 100 or 200 can vary.

I would say that it's likely that, given the same statements shown in your first query, executed in the same way, you would almost certainly get the same outcome.

However, in the real world, with tables seeing other activity, you couldn't really on one outcome or the other, especially over time. It would depend on how the DB engine matched the tables in the join, and then processed the rows in applying the update.