tldr
After spending time with bad databases which I will not mention, I came to question even the point of a database.
I knew that they could serve a purpose to keep permanent data organized, but that was it.
After delving into Postgres with the great help of especially Craig Ringer & Erwin Brandstetter, I've now come to see what a database could and should be:
- Convenient syntax
- Customizable storage methods for performance relative to persistence
- Organized writes and even reads through serialization
- Confidence that writes should never be incomplete
…and so on.
The point
The crash course provided has made me just enough knowledgeable on 1) – 3) to be dangerous, but I know no details about 4).
It seems like well-written database software like Postgres are designed to ensure that a transaction will never commit incomplete data.
Speaking out of ignorance, nothing is perfect, so in that vein, how statistically likely is it that Postgres will incompletely commit a transaction and not be able to detect the malfunction?
Best Answer
Barring undetected PostgreSQL or operating system kernel / C library bugs, and in the presence of good quality hardware with ECC memory, nil.
PostgreSQL's write-ahead logging, where it:
is extremely robust.
You do need to ensure that your operating system and hardware honours fsync requests properly. The
pg_test_fsync
program is useful for this, though the results require interpretation. Also, beware of cheap SSDs without proper power-failure protection.The main risks to data integrity are:
Anyone who gives you "statistics" or numbers on things like the probability of a future bug causing issues is likely full of it.
There's been lots of work done on transient memory errors; the risk is real but the probability is low. I've never seen a bit-flip error in the wild, but I have colleagues who have. Google has lots of info.
The most important thing is proper backup and admin processes.
I wrote a bit more about this topic a while ago.
BTW, there are a few exceptions to the rule that effects aren't visible until they're committed:
SEQUENCE
s (and theSERIAL
columns that use them)