Is the systemd journal not persistent across reboots

systemdsystemd-journaldvps

I'm experiencing a very weird issue with a fresh Fedora 21 image on a Linode instance. I cannot reproduce it outside of Linode. The issue is that my systemd journal is not persistent across reboots. According to the documentation:

By default, the journal stores log data in /run/log/journal/. Since /run/ is volatile, log data is lost at reboot. To make the data persistent, it is sufficient to create /var/log/journal/ where systemd-journald will then store the data.

I have checked that /var/log/journal exists and I have also set Storage=persistent in /etc/systemd/journald.conf. The log directory contains a bunch of data:

$ du -sh /var/log/journal/
89M /var/log/journal/

The journal, however, only contains log entries since the last system restart:

$ journalctl --list-boots
 0 9f6a5a789dd64ec0b067140905e6da86 Thu 2015-03-19 15:08:48 GMT—Thu 2015-03-19 22:14:37 GMT

Even if I journalctl --flush before I reboot the logs are lost. I suspect this is an issue with Linode's Fedora 21 image, and I have opened a support ticket with them. Meanwhile, I continue to search for the cause of this problem.

How can I debug this? What could cause this? What can I do to fix this?

Best Answer

The reason for this behavior is that the machine identifier in /etc/machine-id changes at every reboot. This starts a new logging directory under /var/log/journal. Old logs can be viewed with the following command:

journalctl --merge

I'm still looking into the cause of the changing machine-id. Linode support is aware of the problem. I will update this answer when I know more.


UPDATE -- The root cause of the problem is simply that Linode zeroed out the contents of /etc/machine-id from their filesystem images. The result is the following chain of events:

  1. The kernel loads and mounts the root filesystem read-only
  2. systemd, run from the initial ramdisk, tries to read /etc/machine-id from the root filesystem (the file exists but has zero contents)
  3. systemd cannot read the machine identifier, but can also not write a new one since the root filesystem is mounted read-only
  4. systemd mounts tmpfs on /etc/machine-id (Yes, apparently you can mount a filesystem onto a file)
  5. systemd invokes systemd-machine-id-setup which generates a random machine-id and stores it in the now-volatile /etc/machine-id
  6. The system boots with a volatile machine identifier

You can check if your system has a volatile, rather than a permanent machine-id by looking at the output of mount:

$ mount | grep machine-id
tmpfs on /etc/machine-id type tmpfs (ro,mode=755)

The problem is easy to fix: simply write a persistent machine-id to the real /etc/machine-id. This is easier said than done, however, because you cannot unmount tmpfs from /etc/machine-id on a running system. These are the steps I took to fix it on Linode:

  1. cp /etc/machine-id /etc/machine-id.copy, then poweroff the system
  2. In the Linode Manager, go to the tab Rescue and boot into rescue mode
  3. Access the system via the Lish console
  4. Mount the root filesystem: mount /dev/xvda /mnt
  5. Move the copy created in step 1 to the real machine-id: mv /etc/machine-id.copy /etc/machine-id
  6. Reboot

Such are the consequences of a missing machine-id at boot. I hope this will help a random passer-by in the future.

Related Question