PostgreSQL Deadlock – Understanding ‘While Rechecking Updated Tuple’

deadlockpostgresql

Question: Can someone please decipher while rechecking updated tuple for me in the following:

2019-02-20 10:34:26 UTC:1.2.3.4(17866):user@database:[31391]:ERROR:  deadlock detected
2019-02-20 10:34:26 UTC:1.2.3.4(17866):user@database:[31391]:DETAIL:
    Process 31391 waits for ShareLock on transaction 770573675; blocked by process 25788.
    Process 25788 waits for ShareLock on transaction 770573695; blocked by process 31391.
    Process 31391: update my_table row_to_update set col_to_update = now() from (select id from my_table where col_to_update is null order by id limit $4 for no key update) matched_row_to_update where row_to_update.id = matched_row_to_update.id returning row_to_update.id as id, row_to_update.col_to_update as col_to_update, row_to_update.other_col as other_col
    Process 25788: update my_table set col_to_update=$1 other_col=$2 where id=$3
2019-02-20 10:34:26 UTC:1.2.3.4(17866):user@database:[31391]:HINT:  See server log for query details.
2019-02-20 10:34:26 UTC:1.2.3.4(17866):user@database:[31391]:CONTEXT:  while rechecking updated tuple (36328,33) in relation "my_table"
2019-02-20 10:34:26 UTC:1.2.3.4(17866):user@database:[31391]:STATEMENT:  update my_table row_to_update set col_to_update = now() from (select id from my_table where col_to_update is null order by id limit $4 for no key update) matched_row_to_update where row_to_update.id = matched_row_to_update.id returning row_to_update.id as id, row_to_update.col_to_update as col_to_update, row_to_update.other_col as other_col

I found the it here but couldn't figure it out what exactly is happening there. what is the lock? and how can I debug/resolve it?

To be clear I understand what a deadlock is. I'm asking, based on the error message what is happening, or very simply what the error message means. For example, https://bit.ly/2TaWh6c or https://bit.ly/2Sk0lg0 or https://bit.ly/2E2v9g9 describe different situations with different error message. But I couldn't find anything about the error I'm getting.

Full story:
We have a service (multiple instances working on single DB) which does the following:

Thread #1
1. Fetch some rows from DB to process in transaction #1 (Process 31391).
2. Process fetched rows, update them.
3. Save all processed/updated rows back to DB in transaction #2 (Process 25788).

Thread #2.
1. Periodically check rows in DB which are in process mode for too long and release them by checking and setting col_to_update to null in transaction #3 (since we're using JPA/Hibernate, effectively the query is the same as Process 25788).

Previously we had deadlock errors with the message: while updating tuple (xxx,xxx) in relation my_table between two transactions with the same query (query of Process 31391). We managed to solve it by adding order by id for no key update.

Now we're getting the deadlock above which I don't understand. Even an answer to the following question would help: what's the difference between updating tuple and rechecking updated tuple?

Best Answer

Even an answer to the following question would help: what's the difference between updating tuple and rechecking updated tuple?

I will take you up on this part. When PostgreSQL finds a row that meets the WHERE clause to be updated, but that tuple has already been modified by a not-yet-committed transaction, it will wait for that transaction to commit and then recheck that the new version of the row still meets the WHERE clause.

So 31391 was wanting to update a tuple that 25788 had already updated (presumably in a prior statement in 25788's transaction), and was waiting for 25788 to finish so it could recheck that it still wanted to update it.

25788 was waiting or 31391 to finish, but we don't know why it was waiting for as it didn't log a CONTEXT line.

Can you tell if 25788 was implementing the 3rd step of thread #1, or is it implementing thread #2?

Does your thread #2 issue its 25788-like query each in its own transaction?