Postgresql – Postgres won’t shut down due to wal archiving

postgresqlpostgresql-9.1

I commanded Postgres to shut down using the init.d scripts (Linux) over 18h ago.

I can still see the processes running:

-bash-3.2$ ps -fe | grep postg
postgres  2299  3265  0 16:06 pts/5    00:00:00 ps -fe
postgres  2300  3265  0 16:06 pts/5    00:00:00 grep postg
root      3263 10185  0 May23 pts/5    00:00:00 su - postgres
postgres  3265  3263  0 May23 pts/5    00:00:01 -bash
root      5985 13676  0 May20 pts/3    00:00:00 su - postgres
postgres  5987  5985  0 May20 pts/3    00:00:01 -bash
postgres 14266     1  0 May23 ?        00:06:34 /usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/9.1/data
postgres 14268 14266  0 May23 ?        00:01:51 postgres: logger process
postgres 14270 14266  0 May23 ?        00:01:30 postgres: writer process
postgres 14281 14266  0 May23 ?        00:00:09 postgres: archiver process   last was 000000010000028F000000A3
postgres 14282 14266  0 May23 ?        00:03:07 postgres: stats collector process
postgres 14283 14266  0 May23 ?        00:56:49 postgres: wal sender process postgres 10.40.227.238(12032) streaming 28F/A4000650
postgres 14306 14266  9 May28 ?        04:01:55 postgres: opera_man om 10.40.227.146(44745) SELECT

On the standby server (running normally) I see that:

$ ps -fe | grep postg
cluser   20724  7090  0 09:54 pts/0    00:00:00 psql -U postgres report
postgres 20726 21475  0 09:54 ?        00:01:12 postgres: postgres report [local] idle
postgres 21475     1  0 Apr24 ?        00:00:03 /usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/9.1/data
postgres 21477 21475  0 Apr24 ?        00:00:01 postgres: logger process
postgres 21478 21475  0 Apr24 ?        05:34:10 postgres: startup process   recovering 000000010000028F000000A4
postgres 21485 21475  0 Apr24 ?        00:07:16 postgres: writer process
postgres 21486 21475  0 Apr24 ?        00:00:18 postgres: stats collector process
postgres 24091 21475  0 May23 ?        00:46:49 postgres: wal receiver process   streaming 28F/A40006E0
cluser   32136 30224  0 16:09 pts/16   00:00:00 grep postg

The log shows 'FATAL: the database system is shutting down'. What could be the reason of this and how do I get it to run again?

Best Answer

The error message FATAL: the database system is shutting down is probably due to new incoming connections that are denied because of the shutdown state of the master process.

According to Shutting Down the Server in the documentation, for a fast shutdown, the existing server processes have received a SIGTERM signal to ask them to terminate as soon as possible.

Yet there is this process:

postgres 14306 14266  9 May28 ?        04:01:55 postgres: opera_man om 10.40.227.146(44745) SELECT

that does not want to quit.

You may try the next option, sending SIGQUIT to the master process (pid 14266). This is likely to make it quit and it remains to be seen whether the subprocess with pid 14306 will quickly abort or stay stuck.

It it's stuck, I would kill it with SIGKILL as the last resort and then restart postgres normally with the init script.