PostgreSQL getting the latest value

postgresqlupdate

I have started working on a postgreSQL database. There is a table keeping the record of stocks and every time a record is updated, a copy of the original row needs to be kept, with a flag added to say that the record is no longer current.
I am planning to implement this using a Trigger (which I am assuming is the best option).

In the above case, with a flag to say that a record is not the current one, what's the best way to ensure that reads from the table always used the latest records? Only checking the flag value with where condition?

Best Answer

If it were me, rather than flagging current, I would just time stamp each entry. Then, you would not need to update anything. You would only need to insert a new row. This way, you can query sets of data to plot against time, as a side benefit.

Related Solutions

How does optimistic locking actually enforce re-read/update

The basic technique is quite straightforward. When you read the record you take a note of the version or timestamp column, e.g.

Select FooID
      ,Foo
      ,Bar
      ,TS      -- timestamp
  from Foobar
 where FooID = @FooID

When you go to write out the record you filter the write by the timestamp/version so that the write writes nothing if the timestamp/version has changed. This makes the write atomic, e.g.

update Foo
   set Foo = @foo
      ,Bar = @bar
      ,TS = @timestamp  
 where FooID = @FooID
   and ts = @timestamp

select @row_count = @@rowcount  -- specific to t-sql, but this is a system variable
                                -- that holds the number of rows affected by the
if @@rowcount = 0               -- most recent operation.  Other DBMS platforms do
    [deal with outdated record] -- this differently.

This allows an application to do the update without holding locks open. This is necessary for n-tier systems working through a connection pool, and prevents a class of deadlocks that used to be common on two-tier client-server systems.

There is nothing enforced in the database about this. It's all done explicitly by the application.

PostgreSQL Update – Most Efficient Way to Perform Multiple Updates

If you don't want to interfere with other activity, the one UPDATE at a time in autocommit mode is very likely the best option. You should probably set synchronous_commit=off in that session (and only that session).

The indexes are going to slow you down, perhaps by a lot depending on your RAM and your IO system. But if the index is necessary for the other actions you don't want to interfere with, then there isn't anything you can do about it.

But since the fid is not yet correctly populated, the index on it is probably not actually useful to the concurrent processes you want avoid interfering with, as they haven't been changed yet to rely on that column being accurate. If that is the case, you can drop that index to gain speed, and build it in bulk later. The same probably applies to the foreign key constraint.

Once that index is gone, your updates can proceed via HOT (Heap Only Tuples) updates provided each block has enough free space. In that case, the updates will not have to do maintenance on the primary key index, either, saving that much more IO. To maximize the likelihood that this will work optimally, it is important that each UPDATE be its own transaction. That way one UPDATE can reuse space freed up by an earlier one.

Also, your WHERE clause should probably be like:

WHERE id=345 and fid is not null;

That way if the script gets interrupted, you can re-run it with minimal damage.

Since you seem to be running this on a test system already, then an EXPLAIN (ANALYZE,BUFFERS) of some of the updates would be helpful, especially with track_io_timing set to on.

Best Answer

Related Solutions

How does optimistic locking actually enforce re-read/update

PostgreSQL Update – Most Efficient Way to Perform Multiple Updates

Related Question