I'm struggling with a problem of folk memory re Postgres.
The folklore
I am told that a long time ago in a galaxy far, far away (PostgreSQL 8.4 under Ubuntu 12.04), PostgreSQL frequently suffered from an issue where if the machine holding the server failed (e.g. kernel crash), when the machine was restarted, Postgres would start but the indexes on the database might be corrupted. For instance
SELECT foo FROM bar WHERE baz = 123
would fail because the index on bar.baz
would be corrupted. Reindexing the table fixed the issue but as there was no way to detect this issue, and this was in an appliance environment (no DBA on hand), in essence to guard against this it would be necessary to reindex all tables on a reboot, which would be prohibitively expensive. Thus (the reasoning goes) without a DBA on hand it would not be sensible to use Postgres particularly in an environment where tables were large and computing indices expensive.
Because of this folklore, MySQL is preferred in such circumstances.
My theory
-
I dimly recall seeing this issue happening under 8.4, so I think there's some truth in it.
-
I am assured this also has happened on Postgres 9.3 under Ubuntu 14.0.4. I don't know whether it happens 'frequently'. The system it happened on is running with (admittedly old) enterprise grade SAS drives which claim write caching is turned off, so I don't think this is a write caching problem.
-
Were this still the case, I think I'd be able to find more evidence of it on the web than I can find.
-
Thus I suspect it may be fixed, or may be a result of something bad in
postgresql.conf
(though inspection suggests only the autovacuum settings are changed from the defaults).
Further information
-
In the system I know did this recently (using 9.3) no LVM is in use. The storage system is ext4 on a physical drive mounted with default attributes. This drive is an (admittedly old) enterprise SAS drive plus SAS1068 PCI-X Fusion-MPT SAS controller which
sdparm
claims it doesn't have a write cache switched on, but this has (allegedly) been seen on multiple drives with different vendors etc., though always on ext4 on Ubuntu. -
fsync
is in the default state (on
I believe). -
No master/slave configuration – just a single master – (so it's not the
multixact
bug). -
Tables are created with default indexing (no fancy hash indexes).
-
Postgres regularly updated to whatever the relevant Ubuntu LTS release has. The latest problem was on 14.04.
Questions
-
Is manual rebuilding of indices an expected task after a system crash?
-
If not, has it ever been, and when was that changed?
-
Is it possible to avoid the need for manual reindexing?
Best Answer
TL;DR: Except for a specific bug in PostgreSQL 9.3 prior to 9.3.4 which it's vital that you patch, index corruption after crash is never expected and hasn't been since PostgreSQL 7.1 was released in 2001.
The advice/folklore you got was wrong.
That way predates 8.4 unless you're using hash indexes, which are documented not to be crash safe and are not recommended.
WAL logging of indexes was introduced in the deep dark distant past of mid-2001. PostgreSQL has been crash safe for an exceedingly long time. That advice was wrong.
MySQL switched to crash-safe by default (by defaulting to InnoDB tables) in MySQL 5.5, in 2009. Prior to that its default storage engine, MyISAM, isn't crash-safe and will happily corrupt indexes and tables. So your folklore isn't only wrong, it's backwards.
If you had corruption after a crash under 8.4, then either:
fsync=off
.There have been a couple of index-related bugs over time, but all the others I know of have been minor and/or hard to trigger, and few have been related to crash recovery.
The issue with 9.3 is different. That's a very specific bug that we're all rather embarrassed about. It occurred when a replica PostgreSQL server was promoted to a master after certain other operations had happened in a particular order. It's a really bad bug because it affects the table contents, not just indexes, so it's vital that you patch to 9.3.5 or higher. To learn more about it than you ever wanted to search for "multixact bug".
So:
Not unless you are using hash indexes, which the manual specifically wants you about.
PostgreSQL 7.1, back in 2001.
If you read the release notes you'll find the occasional bug and bug fix, but other than the aforementioned multixact bugs in 9.3, nothing too serious.
Do nothing. Congratulations, you don't need manual reindexing.
Simply:
fsync
set toon
, the defaultfull_page_writes
set toon
, the defaultkill -9
the postmaster but no other processes then deletepostmaster.pid
and start PostgreSQL again. The latest releases even detect it if you do this and try to prevent damage.Bonus points if you: