Postgresql – Is it *always* bad to have long-running transactions, or is this case OK

postgresql

I do this:

BEGIN;
TRUNCATE TABLE test RESTART IDENTITY;
-- Tons of INSERTS to the test table here, taking minutes or many seconds each time.
COMMIT;

This is nothing fundamentally wrong with this, is it? Even though the transaction takes a long time before it finally commits, in this case it's simply necessary for the whole operation be "atomic" and not make other scripts see an empty or half-populated test table, right?

The other queries will still continue to work while it's taking its sweet time to finish, right? This is where my understanding of transactions falls apart. It seems to me that it's guaranteed that I will be able to SELECT from the table while this transaction block is running, but is it also possible to INSERT and DELETE and UPDATE?

My brain cannot comprehend how exactly the database would handle this. Is this what a deadlock is? Or will they continue INSERTing and UPDATEing and DELETING from a "ghost table" and all changes are lost when the long-running transaction finally COMMITs? That doesn't sound right. I assume that it would refuse any "changing" operation to the table while that transaction is running, correct?

Yes, I actually did the manual again today on transactions and deadlocks. Some people are able to read something once and understand it perfectly, while others (like me) can spend many years and repeatedly convince themselves that they have understood something, only to see "nonsensical" errors logged and Bad Things happening to their live production applications due to misunderstandings of how complicated things like this really work.

Best Answer

This is nothing fundamentally wrong with this, is it? Even though the transaction takes a long time before it finally commits,

The risk is locks. Only one transaction can update a given row at a time. There can be various reasons for this. A few easy to understand cases are:

  1. Update statements which depend on the current state, such as update blah set count = count + 1. What would happen if two processes each read a row with count = 4, they each add 1, which produces 5, and they both write 5? One of them had their results corrupted.
  2. Indexes - select first_name, last_name from users where last_name = 'Jones'; - imagine there is an index on last_name. Great! We can optimize the query by scanning the index, then for each row, go out to the table and retrieve the two columns. Imagine a transaction updated the table, but had not yet updated the index, then we come along and suddenly select first_name, last_name from users where last_name = 'Jones'; returns Jane Doe. Indexes are expected to be consistent with the table.
  3. Cascading deletes - if table B defines a foreign key constraint referring to table A with on delete cascade, what happens if you query table B before the cascade delete has happened?

The solution to all of these problems, and more, is locking. Postgres will lock the row for any updates to it so that any other updates to that row have to wait for the lock to be released. When running in a transaction, locks continue to be held until the transaction is committed or rolled back.

It seems to me that it's guaranteed that I will be able to SELECT from the table while this transaction block is running, but is it also possible to INSERT and DELETE and UPDATE?

In PostgreSQL, I believe that is correct.

My brain cannot comprehend how exactly the database would handle this. Is this what a deadlock is?

A deadlock is when multiple transactions, running concurrently, each require the same locks, but request them in a different order.

Process A wants to lock rows X and Y. Process B wants to lock rows Y and X. They each go ahead. Process A locks row X and process B locks row Y. Now what do they do? They enter a state where they wait for each other. Because they each are waiting on each other, neither makes progress, so they wait forever. That is what creates the "dead" in deadlock. The processes will appear to be dead because they are sleeping forever.

Or will they continue INSERTing and UPDATEing and DELETING from a "ghost table" and all changes are lost when the long-running transaction finally COMMITs? That doesn't sound right. I assume that it would refuse any "changing" operation to the table while that transaction is running, correct?

Selects should work fine because what happens with insert, update, and delete, is Postgres creates new versions of the rows and keeps track of which transactions see which versions. If you're not in a transaction and simply select a row, you should see the last committed version of the row at the time you query it. Inserts, updates, and deletes, however, will block, waiting in line for their turn.

Nothing should error. To a reasonable degree, blocking is normal. Blocking is the database protecting your data. However, blocking is a scalability problem because it means only 1 transaction at a time can change something. Like all performance concerns, it's a question of your workload and what tradeoffs you can tolerate. Blocking is a common problem encountered when troubleshooting database performance.

So what's the problem with long-running transactions?

Any rows updated in a transaction may cause other transactions to wait. Those transactions may also be holding locks, and now they are stalled, which may cause them to block others. This pattern leads to blocking chains and in an OLTP workload, can degrade performance substantially.

The premise is simple: Hold as few locks as you need for as short of a time as possible.

What to do in your case?

It depends. Is the truncate statement being run in production or only in test environments? Are there foreign keys involved?

It's odd that the inserts would take so long for you. Can you batch them into a single statement or keep the rows in a second table and after truncate, do a insert into test select ... from second_table?

It really depends on what problem it's causing you, if any. If the table isn't used by most of your queries, then maybe your transaction is just fine. If you're getting blocking issues (issue defined as something impacting users) in production, then you should try to optimize the inserts.