I have a stand-alone instance of PostgreSQL which is in recovery mode. It has been saying
2014-03-24 18:45:57 MDT FATAL: the database system is starting up
For many hours, and ps
shows:
postgres 2637 0.1 0.1 116916 4420 ? Ds 15:43 0:18 postgres: startup process recovering 00000001000000040000007E
Is there any way to watch the progress of the recovery, ideally with an ETA? How can I get this process "un-stuck"?
I can stop postgres using the standard start/stop scripts, but when I start it again, it's still stuck in recovery mode.
Debian 7.4, Linux kernel 3.2.0, PostgreSQL 9.1.12
Output of strace -p 2637
:
Process 2637 attached - interrupt to quit
close(154) = 0
getppid() = 2600
open("pg_clog/0003", O_RDWR|O_CREAT, 0600) = 154
lseek(154, 221184, SEEK_SET) = 221184
write(154, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
fsync(154) = 0
The above output repeats seemingly infinitely.
A gdb backtrace (three subsequent bts were all practically identical):
(gdb) bt
#0 0x00007f0c8f917d70 in fsync () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f0c9169f255 in pg_fsync_no_writethrough (fd=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:286
#2 0x00007f0c9169f265 in pg_fsync (fd=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:274
#3 0x00007f0c9151917b in SlruPhysicalWritePage (ctl=ctl@entry=0x7f0c91b856c0, pageno=pageno@entry=123, slotno=slotno@entry=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1- 9.1.12/build/../src/backend/access/transam/slru.c:801
#4 0x00007f0c91519925 in SlruInternalWritePage (ctl=ctl@entry=0x7f0c91b856c0, slotno=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:528
#5 0x00007f0c9151a1c7 in SimpleLruWritePage (ctl=ctl@entry=0x7f0c91b856c0, slotno=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:565
#6 0x00007f0c915018e3 in clog_redo (lsn=..., record=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/clog.c:720
#7 0x00007f0c91515f08 in StartupXLOG () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:6664
#8 0x00007f0c91518537 in StartupProcessMain () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:10179
#9 0x00007f0c9152f1ba in AuxiliaryProcessMain (argc=<optimized out>, argc@entry=2, argv=<optimized out>, argv@entry=0x7fff651b0610) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/bootstrap/bootstrap.c:409
#10 0x00007f0c9167ddeb in StartChildProcess (type=StartupProcess) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:4512
#11 0x00007f0c916806ac in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x7f0c91d8e180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:1118
#12 0x00007f0c914d392f in main (argc=5, argv=0x7f0c91d8e180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/main/main.c:209
And if I set:
fsync = off
And run the bt again:
(gdb) bt
#0 0x00007f3d99ab7160 in write () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f3d9b6be0c6 in SlruPhysicalWritePage (ctl=ctl@entry=0x7f3d9bd2a6c0, pageno=pageno@entry=123, slotno=slotno@entry=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:783
#2 0x00007f3d9b6be925 in SlruInternalWritePage (ctl=ctl@entry=0x7f3d9bd2a6c0, slotno=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:528
#3 0x00007f3d9b6bf1c7 in SimpleLruWritePage (ctl=ctl@entry=0x7f3d9bd2a6c0, slotno=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:565
#4 0x00007f3d9b6a68e3 in clog_redo (lsn=..., record=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/clog.c:720
#5 0x00007f3d9b6baf08 in StartupXLOG () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:6664
#6 0x00007f3d9b6bd537 in StartupProcessMain () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:10179
#7 0x00007f3d9b6d41ba in AuxiliaryProcessMain (argc=<optimized out>, argc@entry=2, argv=<optimized out>, argv@entry=0x7fff99d477a0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/bootstrap/bootstrap.c:409
#8 0x00007f3d9b822deb in StartChildProcess (type=StartupProcess) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:4512
#9 0x00007f3d9b8256ac in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x7f3d9d614180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:1118
#10 0x00007f3d9b67892f in main (argc=5, argv=0x7f3d9d614180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/main/main.c:209
Best Answer
After googling for hours, I stumbled across a thread that wasn't really related to my issue, best I could tell, but it seemed harmless enough to try, and voilĂ !
I looked at
/var/lib/postgresql/9.1/main/pg_clog
, and saw:Noticing that
/var/lib/postgresql/9.1/main/pg_clog/0003
was the file being opened/seeked in thestrace
output, and that this was the file in question in the forum post, I tried the suggested action in the forum:And now postgres starts again.