Postgresql – BEGIN / COMMIT atomic update

postgresqltransaction

I try to generate unique identifiers for the ticket table's records.
A ticket belongs to a file record, which also have an identifier.
If the file's identifier is 'SOMEPREFIX/F01' then its tickets should be
'SOMEPREFIX/F01/PT001', 'SOMEPREFIX/F01/PT002', 'SOMEPREFIX/F01/PT003' …

The architecture: JS Client, a node.js API with some ORM, but I am also able to run raw SQL statements. DB: PG 9.6

The problem: If the client send (almost) the same time a bunch of ticket create requests, the DB generates sometimes the same identifier.

Here is an sql query I generate dynamically in the API after the ticket is already inserted:

      BEGIN;
      UPDATE ticket SET 
      ticket_number_index = (COALESCE((SELECT MAX(ticket_number_index) 
                        FROM ticket 
                        WHERE file_id = 'D530'), 0) + 1),
      ticket_number = CONCAT('SOMEPREFIX\F717\PT', TO_CHAR((COALESCE((SELECT MAX(ticket_number_index) 
                        FROM ticket 
                        WHERE file_id = 'D530'), 0)  + 1), 'fm00000'))
      WHERE id = 'D3571';
      SELECT * FROM ticket WHERE id = 'D3571';
      COMMIT;

I would assume that BEGIN and COMMIT makes the code atomic, and after the first UPDATE the next one can't produce the same identifier.
Although I have just saved saved two tickets (two POST queries started almost the same time by the client), and I ended up with 'SOMEPREFIX\F717\PT1' and 'SOMEPREFIX\F717\PT1'

Best Answer

Your approach is broken because of the atomic transactions.

Any change to the table is not visible to other transactions until the change is committed.

So if three transactions are started at the same time, the max() value will be the same for all three of them resulting in the same generated identifier. The only way to avoid this, is to exclusively lock the entire table before you insert a row. Which means you can not have more than one transaction inserting rows at one given time.

In general using this approach is a really bad idea because implementations are either broken (as you have discovered) or they won't scale because of the locking needed.

The only scalable and safe way to generate unique numbers is to use a sequence.

There is also no need to store the prefix twice, just store the prefix e.g.: 'SOMEPREFIX\F717\PT' in one column and store the unique identifier generated by a sequence in another column. If you need to display them as one, do that in the application.

Related Solutions

Sql-server – Transaction level difference between using a large IN filter VS. “BEGIN TRAN/COMMIT”

From a logging point of view 1 and 2 will be about the same as in both cases you are doing all the deletes within a single transaction. There will be lots of locking and probably blocking while the delete is being run. #3 is a single transaction per batch, so users won't be impacted very much and you'll have lots of small transactions in the transaction log instead of one large one. #2 and #3 should take about the same amount of time to run. #1 should take less time than #2 and #3 because it's just one command, however the run time will probably still cause problems.

I'd probably want to do something like this. This will minimize the locking and blocking that needs to happen by only dealing with 1000 rows at a time.

SELECT NULL
WHILE @@ROWCOUNT <> 0
BEGIN
  DELETE TOP (1000) FROM dbo.x
  WHERE ID IN ( /* if possible use "BETWEEN 1 and 100000" instead*/
  1
  ,2
  ,3
  ,4
  ...
  ,100000
  )
END

Database atomic operations implementation

A transaction is started for each statement that occurs outside of an explicit transaction block. Whether a commit is automatically issued following the statement is dependent on the RDBMS configuration. MySQL has the autocommit option, SQL Server has IMPLICIT_TRANSACTIONS, PostgreSQL is always auto commit.

PostgreSQL:

In the standard, it is not necessary to issue START TRANSACTION to start a transaction block: any SQL command implicitly begins a block. PostgreSQL's behavior can be seen as implicitly issuing a COMMIT after each command that does not follow START TRANSACTION (or BEGIN), and it is therefore often called "autocommit". Other relational database systems might offer an autocommit feature as a convenience.

InnoDB:

In InnoDB, all user activity occurs inside a transaction. If autocommit mode is enabled, each SQL statement forms a single transaction on its own. By default, MySQL starts the session for each new connection with autocommit enabled, so MySQL does a commit after each SQL statement if that statement did not return an error.

SQL Server:

SQL Server operates in the following transaction modes.

Autocommit transactions - Each individual statement is a transaction.

Explicit transactions - Each transaction is explicitly started with the BEGIN TRANSACTION statement and explicitly ended with a COMMIT or ROLLBACK statement.

Implicit transactions - A new transaction is implicitly started when the prior transaction completes, but each transaction is explicitly completed with a COMMIT or ROLLBACK statement.

Best Answer

Related Solutions

Sql-server – Transaction level difference between using a large IN filter VS. “BEGIN TRAN/COMMIT”

Database atomic operations implementation

Related Question