PostgreSQL not starting correctly, is it repairing itself first

postgresql

I never saw that problem before. I had problems and many PostgreSQL processes were stuck so I killed them with a -KILL…

When I tried to restart, it says that it cannot restart, but the daemon continues to run and uses a little processor a lot of I/O. Is it trying to repair the database?

I get no log at all. I think there is a way to increase log output… I'll look into that. At this point, the socket to connect to the server doesn't get created, but the server does not quit or emit any error/message, so I have no clue what is going on!?

If anyone has a clue, I'd be glad to hear about it.

Best Answer

I had problems and many postgresql processes were stuck so I killed them with a -KILL...

Don't do this. It won't cause data corruption, but as you've discovered it forces the whole database system to restart and do crash recovery.

If you hard-kill any database back-end with SIGKILL (kill -9) the postmaster has to assume that shared memory might be corrupted and it has to kill and restart all workers, doing crash recovery as if the server its self had crashed and restarted.

You should not need to SIGKILL a backend. Use regular SIGTERM to ask it to stop what it's doing and exit - or better, use pg_terminate_backend(...) from within SQL. If the back-end doesn't respond, SIGQUIT should force it to terminate.

If you do SIGKILL, crash recovery usually takes a few seconds; minutes for a really big and busy databases. However, if you have a long checkpoint_timeout and huge checkpoint_segments you can accumulate a lot of work that must be done before the DB can become available again. If you have quite slow disk I/O this will be even worse.

PostgreSQL does produce logs while it's in recovery, and they're at a log level that makes it unlikely that they'd be suppressed. So it's hard to say what could be going on. Maybe you were looking at syslog, but your PostgreSQL install is configured to log directly to files in /var/log/pgsql or in the pg_log directory in the datadir?

(For anyone else reading this, never SIGKILL the postmaster then delete the postmaster pid file and restart PostgreSQL while there are still old postgres back-ends running. That can cause data corruption, since you've disabled every safety measure PostgreSQL puts in place to stop you from starting a new postmaster while old backends might still be running.)