Postgresql – Difficulty understanding Postgres synchronous_commit = off and data loss

postgresql

I am confused what "data loss" means the context of synchronous_commit = off.

It sounds like synchronous_commit = off is pretty safe to increase write perf, and may result in "data loss", but not "data corruption" which sounds a lot worse.

So as I understand, if things crash, I might have lost some data a second before the crash. But the database is still consistent and not "corrupted", so I can simply re-run what didn't go through once it's back up, correct? My database does not end up in a state where it "kinda happened" and all the data is messed up?

I think it may also help me understand to give me examples where synchronous_commit = off is NOT ok to use.

In the documentation (https://www.postgresql.org/docs/11/wal-async-commit.html) it says:

Thus asynchronous commit should not be used if the client will take external actions relying on the assumption that the transaction will be remembered. As an example, a bank would certainly not use asynchronous commit for a transaction recording an ATM's dispensing of cash. But in many scenarios, such as event logging, there is no need for a strong guarantee of this kind.

I am confused because the ATM dispensing cash is too obvious of an example, and the event logging example does not go into enough depth for me to understand the difference.

Say we wanted to sequentially save 3 items in our database table called numbers: [1, 2, 3]
And we have a client that queries the table numbers for what was last saved, and will resume from where it left off.

numbers
---
id
---
1

So if we ran our client, it will find 1 and start at 2. Now if we write 2, with asynchronous commit we get a premature success signal back, but then our database suddenly crashes before it can actually persist. Now if we were to restart our client, would it resume from 2 again (correct) or skip it and start with 3 (incorrect)?

Best Answer

so I can simply re-run what didn't go through once it's back up, correct?

How would you know to do so?

Say we wanted to sequentially save 3 items in our database table called numbers: [1, 2, 3] And we have a client that queries the table numbers for what was last saved, and will resume from where it left off.

So it saves all three. It queries the database and sees all three. Then it says "OK, I'm good" and exits. But then the database crashes, and when it comes back up only the 1st one is there. Unless the client is always periodically rechecking, it won't know anything has been lost in order to fix it.


Another scenario. One client commits all three rows, and a second program immediately queries and sees all three, and uses that data to make a decision. If the decision of the second program is recorded exclusively into the database, then you are fine. If any of those 3 rows are lost, the recorded decision is also lost. But if that second program sent its decision to something outside the database, then you can have a problem because the external decision survives the crash, but the data upon which the decision was based might no longer exist once the database comes back up.