The basic technique is quite straightforward. When you read the record you take a note of the version or timestamp column, e.g.
Select FooID
,Foo
,Bar
,TS -- timestamp
from Foobar
where FooID = @FooID
When you go to write out the record you filter the write by the timestamp/version so that the write writes nothing if the timestamp/version has changed. This makes the write atomic, e.g.
update Foo
set Foo = @foo
,Bar = @bar
,TS = @timestamp
where FooID = @FooID
and ts = @timestamp
select @row_count = @@rowcount -- specific to t-sql, but this is a system variable
-- that holds the number of rows affected by the
if @@rowcount = 0 -- most recent operation. Other DBMS platforms do
[deal with outdated record] -- this differently.
This allows an application to do the update without holding locks open. This is necessary
for n-tier systems working through a connection pool, and prevents a class of deadlocks
that used to be common on two-tier client-server systems.
There is nothing enforced in the database about this. It's all done explicitly by the application.
If you don't want to interfere with other activity, the one UPDATE at a time in autocommit mode is very likely the best option. You should probably set synchronous_commit=off in that session (and only that session).
The indexes are going to slow you down, perhaps by a lot depending on your RAM and your IO system. But if the index is necessary for the other actions you don't want to interfere with, then there isn't anything you can do about it.
But since the fid is not yet correctly populated, the index on it is probably not actually useful to the concurrent processes you want avoid interfering with, as they haven't been changed yet to rely on that column being accurate. If that is the case, you can drop that index to gain speed, and build it in bulk later. The same probably applies to the foreign key constraint.
Once that index is gone, your updates can proceed via HOT (Heap Only Tuples) updates provided each block has enough free space. In that case, the updates will not have to do maintenance on the primary key index, either, saving that much more IO. To maximize the likelihood that this will work optimally, it is important that each UPDATE be its own transaction. That way one UPDATE can reuse space freed up by an earlier one.
Also, your WHERE clause should probably be like:
WHERE id=345 and fid is not null;
That way if the script gets interrupted, you can re-run it with minimal damage.
Since you seem to be running this on a test system already, then an EXPLAIN (ANALYZE,BUFFERS) of some of the updates would be helpful, especially with track_io_timing set to on.
Best Answer
If it were me, rather than flagging current, I would just time stamp each entry. Then, you would not need to update anything. You would only need to insert a new row. This way, you can query sets of data to plot against time, as a side benefit.