I'm having trouble with a system that I didn't set up or configure myself, but pieces of the system seem to generally be in more or less default places. My current theory is that the PostgreSQL database on the (Ubuntu 16.04.3 LTS) system stopped working when the storage on the server completely filled up.
Storage has been increased since, some old files removed, but the PostgreSQL service/cluster refuses to start. See below:
ubuntu@ip-<my-ip-here>:~$ sudo pg_ctlcluster 11 main start
Job for [email protected] failed because a configured resource limit was exceeded. See "systemctl status [email protected]" and "journalctl -xe" for details.
ubuntu@ip-<my-ip-here>:~$ systemctl status [email protected]
● [email protected] - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/[email protected]; disabled; vendor preset: enabled)
Active: failed (Result: resources) since Mon 2023-01-09 16:24:10 AEST; 7s ago
Process: 2281 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect %i start (code=exited, status=1/FAILURE)
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.052 AEST [2289] CONTEXT: WAL redo at 7F/200AA78 for XLOG/FPI:
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.053 AEST [2288] LOG: startup process (PID 2289) exited with exit code 1
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.053 AEST [2288] LOG: aborting startup due to startup process failure
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.059 AEST [2288] LOG: database system is shut down
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: pg_ctl: could not start server
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: Examine the log output.
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: [email protected]: Can't open PID file /run/postgresql/11-main.pid (yet?) after start: No such file or directory
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: Failed to start PostgreSQL Cluster 11-main.
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: [email protected]: Unit entered failed state.
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: [email protected]: Failed with result 'resources'.
The Can't open PID file /run/postgresql/11-main.pid (yet?) after start: No such file or directory
part seems like a red herring – a problem that occurs after the actual problem has already happened. (also, if I create that file, with rights and ownership for the postgres
user, it get removed as expected)
So, I assume the problem is with the CONTEXT: WAL redo at 7F/200AA78 for XLOG/FPI:
followed by LOG: startup process (PID 2289) exited with exit code
and aborting startup due to startup process failure
– something apparently errors out during 'WAL redo'.
The line Examine the log output.
sounds wise, but I can't find a log in any of the places I'd expect it:
ubuntu@ip-<my-ip-here>:~$ ls /etc/postgresql/11/main
conf.d environment pg_ctl.conf pg_hba.conf pg_ident.conf postgresql.conf start.conf
And the postgresql.conf
doesn't offer any clues on where I'd be able to find it otherwise – so I'm assuming it doesn't get to writing it, perhaps?
Any advice on how to get the server to progress beyond this point is welcome. I wouldn't mind losing whatever data it's trying to consolidate from the WAL redo – I'm mainly interested in getting the system back up and running.
Best Answer
As user @user1686 pointed out correctly, the logs for PostgreSQL on Ubuntu are to be found in
/var/log/postgresql
, calledpostgresql-11-main.log
as I was interested in themain
database and am running PostgreSQL 11.In that log file the following was recorded:
This explains why the redo fails, there is no space on the device holding
pg_tblspc/17293
.After the following:
And:
I realised that I had to increase the size of the
xvdf
volume, which I was able to do through increasing the size of the volume on the VM, and then grow the file system usingsudo resize2fs /dev/xvdf
(as it was anext4
file system). After that, I was able to restart the PostgreSQL cluster, and resolve my further issues.