In SQL databases does an update rewrite all columns in the row or only those in the SET tuple

performancerdbmsupdate

i was wondering how a sql update works in two scenarios:

table with 3 columns: a, b, c

say the data value for a has to be changed.

1: update on a row setting all columns a, b, c (where b and c have not changed from what is stored)

2: update on a row setting only the a column value (because b and c have not changed)

is there any performance difference between these? im having a hard time finding information about how they actually work behind the scenes. from the few articles i found my understanding is that in an update the db will:

load the row (all columns) into memory
set the values from the set tuple (in order)
write the updated row back

Best Answer

This is going to vary between databases, but there are things that we can say are generally (but certainly not always) true of common databases such as SQL Server, Postgres, etc.

Drive IO generally works in blocks, usually these days of 4Kbyte or 0.5K, and to write any of a block the drive will write the whole block. Database engines organise their structures into pages for various reasons including to mach the IO subsystem. In MS SQL Server each page is 8K, done databases allow you to choose on a per DB or per server basis. As with the underlying drives writing to a page will result in rewriting the whole page (which may be multiple disk blocks in size) not just the part constituting the updated row(s) or the individual columns in the row(s). So the same amount of IO will be done in your two examples (SET a & SET a, b, c).

In fact more than just that page will be affected: an entry will be made in the transaction log first, and relevant index pages may need to be updated too. The MVCC model used (or supported as an option) by some database systems as extra details to the question, but that detail may be getting a bit deeper than the intended scope of your question.

Another point that does seem relevant is that this IO will happen even if nothing really changes: if you are seeing column a to 42 and it is already 42 the writes will still happen. Further more, any relevant triggers will still fire, constraints will be checked, etc. This is why you sometimes see statements like UPDATE someTable SET aColumn = @aValue WHERE aColumn <> @aValue, to avoid excess activity (if doing this in your code, be sure to be careful to account for NULLs if they might be present, which my example does not for the sake of being concise, also be aware that there are circumstances where you might want triggers to fire even if an update otherwise effectively does nothing).

It is worth reiterating before finishing this answer, that none of this is true for all databases, or all IO more generally, but is only commonly true.

Related Solutions

Sql-server – Auto Update Statistics in SQL Server 2008R2: Why do some statistics remain stale despite a large number of row inserts

Some info, if not a definitive answer

It's been blogged recently

When are Statistics Updated – What triggers Statistics to Update
Comment from Paul Randal

There is a whitepaper too. See the section "Maintaining Statistics in SQL Server 2008" where there are some conditions that sound like affect you. Example:

One limitation of the automatic update logic is that it tracks changes to columns in the statistics, but not changes to columns in the predicate. If there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes.

At the end there are some settings to check too: what if OFF at the DB level which overrides an ON at the index/stat level?

HTH...

How does optimistic locking actually enforce re-read/update

The basic technique is quite straightforward. When you read the record you take a note of the version or timestamp column, e.g.

Select FooID
      ,Foo
      ,Bar
      ,TS      -- timestamp
  from Foobar
 where FooID = @FooID

When you go to write out the record you filter the write by the timestamp/version so that the write writes nothing if the timestamp/version has changed. This makes the write atomic, e.g.

update Foo
   set Foo = @foo
      ,Bar = @bar
      ,TS = @timestamp  
 where FooID = @FooID
   and ts = @timestamp

select @row_count = @@rowcount  -- specific to t-sql, but this is a system variable
                                -- that holds the number of rows affected by the
if @@rowcount = 0               -- most recent operation.  Other DBMS platforms do
    [deal with outdated record] -- this differently.

This allows an application to do the update without holding locks open. This is necessary for n-tier systems working through a connection pool, and prevents a class of deadlocks that used to be common on two-tier client-server systems.

There is nothing enforced in the database about this. It's all done explicitly by the application.

Best Answer

Related Solutions

Sql-server – Auto Update Statistics in SQL Server 2008R2: Why do some statistics remain stale despite a large number of row inserts

How does optimistic locking actually enforce re-read/update

Related Question