Postgresql – BEGIN / COMMIT atomic update

postgresqltransaction

I try to generate unique identifiers for the ticket table's records.
A ticket belongs to a file record, which also have an identifier.
If the file's identifier is 'SOMEPREFIX/F01' then its tickets should be
'SOMEPREFIX/F01/PT001', 'SOMEPREFIX/F01/PT002', 'SOMEPREFIX/F01/PT003' …

The architecture: JS Client, a node.js API with some ORM, but I am also able to run raw SQL statements. DB: PG 9.6

The problem: If the client send (almost) the same time a bunch of ticket create requests, the DB generates sometimes the same identifier.

Here is an sql query I generate dynamically in the API after the ticket is already inserted:

      BEGIN;
      UPDATE ticket SET 
      ticket_number_index = (COALESCE((SELECT MAX(ticket_number_index) 
                        FROM ticket 
                        WHERE file_id = 'D530'), 0) + 1),
      ticket_number = CONCAT('SOMEPREFIX\F717\PT', TO_CHAR((COALESCE((SELECT MAX(ticket_number_index) 
                        FROM ticket 
                        WHERE file_id = 'D530'), 0)  + 1), 'fm00000'))
      WHERE id = 'D3571';
      SELECT * FROM ticket WHERE id = 'D3571';
      COMMIT;

I would assume that BEGIN and COMMIT makes the code atomic, and after the first UPDATE the next one can't produce the same identifier.
Although I have just saved saved two tickets (two POST queries started almost the same time by the client), and I ended up with 'SOMEPREFIX\F717\PT1' and 'SOMEPREFIX\F717\PT1'

Best Answer

Your approach is broken because of the atomic transactions.

Any change to the table is not visible to other transactions until the change is committed.

So if three transactions are started at the same time, the max() value will be the same for all three of them resulting in the same generated identifier. The only way to avoid this, is to exclusively lock the entire table before you insert a row. Which means you can not have more than one transaction inserting rows at one given time.

In general using this approach is a really bad idea because implementations are either broken (as you have discovered) or they won't scale because of the locking needed.

The only scalable and safe way to generate unique numbers is to use a sequence.

There is also no need to store the prefix twice, just store the prefix e.g.: 'SOMEPREFIX\F717\PT' in one column and store the unique identifier generated by a sequence in another column. If you need to display them as one, do that in the application.