Journalctl –verify reports corruption

systemdsystemd-journald

I just noticed this verify option for journalctl and decided to give it a go, it's showing corruption, what might cause that? and what if anything should I do about it? should I investigate further?

journalctl --verify
PASS: /var/log/journal/19184893a1d645c7a43729e79b10a876/user-1000.journal
Invalid object contents at 3733856░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   0%
File corruption detected at /var/log/journal/19184893a1d645c7a43729e79b10a876/system.journal:3733856 (of 91734016, 4%).
FAIL: /var/log/journal/19184893a1d645c7a43729e79b10a876/system.journal (Bad message)
Invalid object contents at 21575496░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  45%
File corruption detected at /var/log/journal/19184893a1d645c7a43729e79b10a876/system@60e058db556e4de4b256d0b1ff176aa4-0000000000000001-0004e0b436d20aa1.journal:21575496 (of 44052480, 48%).
FAIL: /var/log/journal/19184893a1d645c7a43729e79b10a876/system@60e058db556e4de4b256d0b1ff176aa4-0000000000000001-0004e0b436d20aa1.journal (Bad message)
PASS: /var/log/journal/19184893a1d645c7a43729e79b10a876/user-1000@60e058db556e4de4b256d0b1ff176aa4-0000000000000a91-0004e0b4ff9a949a.journal
PASS: /var/log/journal/19184893a1d645c7a43729e79b10a876/user-1001.journal

Best Answer

Currently, journalctl can detect corrupt logs but has no "fsck" type command to attempt repair. The journald will automatically switch to writing a new "clean" file as soon as it detects the problem, so theoretically data loss should be minimal.

Until there is a file-repairing command, finding the corrupt journal file and removing it is the only cure. You can find more on this in our Fedora mega-thread over making journal-only logging the default:

For tail corruptions your normal journalctl tool will provide you with as much information as is possible to salvage from the file. It will output the last complete log line and then finish. This is pretty close to how good you can get.

Things are different for corruptions in the middle. We have no nice tool for salvaging data from such corruption, but they could be written relatively easily. However, since they are highly unlikely due to the "append-only" model of the journal this hasn't been on our TODO list.

Of course, if you can identify what caused the problem initially and report it, that would be nice.

Related Solutions

How to clear journalctl

The self maintenance method is to vacuum the logs by size or time.

Retain only the past two days:

journalctl --vacuum-time=2d

Retain only the past 500 MB:

journalctl --vacuum-size=500M

man journalctl for more information.

How to get kernel boot log with journalctl

Obviously if the system runs for days I might no get this information. Is my understanding correct here?

Yes. It's dependent from how much log information is generated, but eventually the boot information will scroll off the beginning of both the kernel's ring buffer and the systemd journal. It's no guide to how long it takes on anyone else's systems, but I have systems which have uptimes in the hundreds of days whose boot log data have long since scrolled off the top of the systemd journal. This is one of the disadvantages of having one giant combined log stream that everything fans into and then fans back out from again.

So take a leaf from FreeBSD and NetBSD and their derivatives. They all have services that run once, at bootstrap just after local filesystems have mounted, that simply do:

dmesg > /var/run/dmesg.boot

Thus a snapshot of the kernel log as it was at bootstrap is available in /var/run/dmesg.boot even if it has since scrolled off the actual logs.

You simply need to write a systemd service that does the same. Use the shell for redirection,

ExecStart=/bin/sh -c "exec dmesg > /run/dmesg.boot"

or use something like Laurent Bercot's redirfd or the nosh toolset's fdredir

ExecStart=/usr/local/bin/fdredir --write 1 /run/dmesg.boot dmesg

Substitute journalctl -k if you want to snapshot the systemd journal rather than just the kernel's log, and make this a Type=oneshot service. Either make it wanted by multi-user.target or make it a DefaultDependencies=no service that is wanted by basic.target. Note that it does not have to be ordered after local filesystem mounts (i.e. local-fs.target). That ordering is necessary for FreeBSD and OpenBSD because /var/run could be a disc filesystem with them. On systemd operating systems /run is an "API filesystem" that is created at bootstrap before any services.

(The approach that I personally prefer is not to have the giant central log stream in the first place. A dedicated service feeds off the kernel log feed alone and logs to a private log directory. That takes a lot longer to reach the point where last bootstrap information scrolls off the top. And it also contains boot logs from prior boots.

However, this is a lot more complex to set up in a systemd world than a oneshot that writes a /run/dmesg.boot. It is simple in a daemontools family world, though. It's a trivial exercise in the use of tools such as fifo-listen and klog-read, or socklog. Piping the output through a log dæmon that writes to a private, reliably size-capped, auto-rotated, log directory comes as standard with a daemontools/runit/s6/nosh/perp-managed service.)

Best Answer

Related Solutions

How to clear journalctl

How to get kernel boot log with journalctl

Related Question