Postgresql – Postgres record locking issue – could asynchronous UPDATES be a solution

performancepostgresql-9.3postgresql-performance

Is it possible (and sensible) to issue asynchronous UPDATEs to a postgres table?

Here’s the issue. I have two process, both of which access the same table.

Process “A” loads vast quantities of external records into a postgres table, this involves a trigger that updates a summary table (the summary table is the one involved in the conflict)

Process “A” is quite critical, by tweaking the number of records in each INSERT statement, and the number of INSERTS before issuing a COMMIT, I can insert about 4000 records in a single transaction that takes between 5 and 8 seconds.

Process “B” reads data from the database, does a lot of number crunching on the data and then UPDATEs the summary table. The problem is that although the reading and processing of the data can be achieved in milliseconds, the UPDATEing of the summary table typically takes 4 to 5 seconds to wait for the process “A” transaction to release it’s locks on the summary table. This is very in-efficient and dramatically slows down the overall process.

The following may (or may not) also be relevant.

  • Process “A” updates only fields “A”, “B” and “C” in the summary
    table.
  • Process “B” updates only fields “D”, “E” and “F” in the summary
    table.
  • Both process “A” and “B” only update the summary table by
    explicitly identifying single rows via the unique primary key.
  • No other processes read or update the summary table.
  • Only the latest version (for any one record) of the values in “D”, “E” and “F” are relevant.

Bearing in mind the above, is it possible for process “B” to issues it’s UPDATES to the summary table asynchronously (by this I mean issue the UPDATE, and then carrying on with the next iteration of the process without waiting for the response to the UPDATE). If it is possible, how? And perhaps more importantly, is it sensible?

Thanks

Best Answer

Having thought a bit more about the problem, I realise there's actually a relatively simple solution. Since each process only updates its 'own' fields the solution is to split the data into two tables allowing each half to be updated by it's own process and any SELECTs can be done by joining on the common primary key.