Fedora – What causes permission to be denied for mounting rootfs, home, messeage queue, kernel file system, during boot

bootfedorapermissionstroubleshooting

My Fedora 27 x64 fails to boot after hard reset. It shows:

Failed to mount POSIX Message Queue File System,
Failed to start Remount and Kernel File Systems,
Failed to mount Kernel Debug File System,
Failed to mount Huge Pages File System [3]

and lots of other failures comes after these.
See https://photos.app.goo.gl/qBUxT40zA2MTLTwO2

In all these cases

Failed at step EXEC spawning /usr/bin/mount: Permission denied

is given as a reason.
How can it be? Doesn't it recognize it's own filesystems?

I have 3 kernels:

vmlinuz-4.14.16-300.fc27.x86_64
vmlinuz-4.15.13-300.fc27.x86_64
vmlinuz-4.15.14-300.fc27.x86_64

no matter which one I try to boot the same happens.

So far I have:

Checked filesystem integrity with fsck. All partitions are clean.
Checked disk health reported by SMART and performed both – short and long tests. Disk is perfectly healthy.
Rebuilt initramfs. Mounted boot, proc, sys, dev in /mnt, chroot and sudo dracut.

Followed suggestions and:

Performed
fsck -f on /dev/mapper/fedora-home, got:

tree extents for i-node 524820 (on level 2) could be narrower. Fix?<y>Y

Allowed to fix this.

And the same for /dev/mapper/fedora-root, /dev/sda1 (boot partition) confirmed they are clean. One more error of the same kind was found for an extra partition for data files.

rpm -V --all | grep -v " [cg] " returned as follows:

.M.......    /run/libgpod
..5....T.    /var/lib/selinux/targeted/active/commit_num
.......T.    /var/lib/selinux/targeted/active/file_contexts
.......T.    /var/lib/selinux/targeted/active/homedir_template
S.5....T.    /var/lib/selinux/targeted/active/policy.kern
.M.....T.    /var/lib/selinux/targeted/active/seusers
.M.....T.    /var/lib/selinux/targeted/active/users_extra
.M.......    /var/run/pluto
not exists   /var/run/abrt
.M.......    /var/log/audit
not exists   /usr/lib/systemd/system-preset/85-display-manager.preset
S.5....T.    /usr/share/icons/Crux/icon-theme.cache
S.5....T.    /usr/share/icons/Mist/icon-theme.cache

rpm -V "$(rpm -q --whatprovides /usr/bin/mount)"
.M....G..  g /var/log/lastlog

fixfiles check /usr

libsemanage.semanage_make_sandbox: Error removing old sandbox directory /var/lib/selinux/targeted/tmp. (Read only file system). 
genhomedircon: Could not begin transaction: Read only file system

Among many lines similar to to the one below:

Would relabel /usr/src/handbrake/trunk/build/contrib/lib from unconfined_u:object_r:usr_t:s0 to unconfined_u:object_r:lib_t:s0

a few interesting ones are:

Would relabel /usr/sbin/mount.nilfs2 from unconfined_u:object_r:bin_t:s0 to unconfined_u:object_r:mount_exec_t:s0
Would relabel /usr/sbin/umount.nilfs2 from unconfined_u:object_r:bin_t:s0 to unconfined_u:object_r:mount_exec_t:s0
Would relabel /usr/sbin/mkfs.nilfs2 from unconfined_u:object_r:bin_t:s0 to unconfined_u:object_r:fsadm_exec_t:s0

Proved RAM works fine – memtest86 didn't find any errors during 3.5
passes and over 8h test time.

9. Disabled SELinux (SELinux=disabled in /etc/selinux/config) and restarted. System started without any error! This proves problem is in SELinux policies. I believe I should start with checking those 6 top SELinux policies that have been changed somehow (see p. 5). The question is how to do it wisely.

Checked local modifications to SELinux config files and file_contexts:

semanage module -C -l
Module name              Priority  Language
semanage fcontext -C -l
fcontext SELinux                                 type               Context
/usr/bin/mount                                     all files          system_u:object_r:samba_share_t:s0
/usr/share/dnfdaemon/dnfdaemon-system              all files          system_u:object_r:rpm_exec_t:s0
/var/run/media/przemek/extra(/.*)?                 all files          system_u:object_r:samba_share_t:s0
/var/www/html/photo                                all files          system_u:object_r:httpd_sys_rw_content_t:s0
/var/www/html/photo/_cache                         all files          system_u:object_r:httpd_sys_rw_content_t:s0
/var/www/html/photo/config                         all files          system_u:object_r:httpd_sys_rw_content_t:s0
/var/www/html/photo/content                        all files          system_u:object_r:httpd_sys_rw_content_t:s0
/var/www/html/photo/content/folders.json           all files          system_u:object_r:httpd_sys_rw_content_t:s0
/var/www/html/photo/iv-config/language             all files          system_u:object_r:httpd_sys_rw_content_t:s0

Interestingly fcontext of the /usr/bin/mount has changed.

The system runs 24h/day as a simple home server (www, mail, etc.).
From time to time (say once a few weeks) it freezes completely. HDD keeps writing something (repetitive, although irregular sound). No reaction to keyboard, mouse, remote SSH access. Many times I have tried to leave it overnight, but it does not recover, so I am forced to hard reset it each time this happens. This time I haven't waited, but hard reset it after just a few minutes. Unfortunately since then it cannot boot.

I remembered that a minute or less before the system froze Firefox message box appeared telling me that some script became irresponsive. I don't remember my choice (kill it/wait).

Hardware: Gigabyte GB-BACE-3160 Brix PC with Hitachi HTS725032A9A364 2.5" HDD and 4GB LPDDR3 RAM (default clock).

More details [here]

Best Answer

What causes permission to be denied for mounting rootfs, home, messeage queue, kernel file system, during boot?

[...]

In all these cases
Failed at step EXEC spawning /usr/bin/mount: Permission denied
is given as a reason. How can it be? Doesn't it recognize it's own filesystems?

The reason that it can't mount any of these filesystems, is literally that it cannot run the mount program due to a permission denied error. That's what the log messages you found are saying. We anticipate you will see the same error if you try to run mount yourself...

It sounds very much like the system on disk has been corrupted.

Generally, systems which freeze will not be happy, and this is not an entirely unexpected outcome. It is likely you are having a problem with your hardware, a kernel bug, or a hardware bug which the kernel currently does not know how to work around.

Faulty RAM can cause weird things, and it's relatively cheap to replace, so it's definitely worth looking at. You can use e.g. memtest86+ to test the RAM. Thanks to @thecarpy for suggesting this.

If Linux has a problem dealing some new but otherwise popular hardware, someone else might get the problem fixed in a newer version of Linux. If the hardware has been around for a while, or other Linux users don't use it, it might be very hard to fix.

I used one such system myself, affected by an issue like this one - which seems to have been an issue with very low power-draw Intel CPUs[1][2] - I think the J-series like your Celeron J3160 is somewhat higher power, so I'm not sure whether or not it could be the same issue.

However, if we ignore that entirely and wonder how to look at what the immediate cause of an "impossible" message like this might be, I have two more techniques to suggest. (And then a nitpick one of the techniques you already used, or at least how you described it to us).

You can verify against original package checksums using rpm -V "$(rpm -q --whatprovides /usr/bin/mount)", and more widely using rpm -V --all. In the output, you should generally ignore lines which include " c " as these represent config files which are allowed to be changed. You will see a lot of these when you use --all. Also ignore lines which say " g "; this means dynamically "generated" files including some logfiles. So you might want to use rpm -V --all | grep -v " [cg] ". It's probably worth trying the first approach which just checks the package for mount to start with, because that will be a nice quick check. http://ftp.rpm.org/max-rpm/s1-rpm-verify-output.html
Given that you are on Fedora, a second horror to check for regarding "Permission denied", would be corruption of SELinux attributes. The program for this is fixfiles. For example, fixfiles check. This too is likely to provide a horrifying amount of output which is not relevant, because not even Fedora developers actually know how to use SELinux correctly. I feel useful suggestions for this command are even harder to find. fixfiles check /usr should hopefully at least cover the specific error you have with /usr/bin/mount, while also looking for some wider damage, and avoiding the most noisy false positives.

So far I have:

Checked disk health reported by SMART and performed both - short and long tests. Disk is perfectly healthy.

Right, it's always good to test SMART health. And if you suspect disk corruption, running one of the tests (I assume long is best) is a good checklist item.

Checked filesystem integrity with fsck. All partitions are clean.

Good thought, but I have a nitpick. To specify that you have confirmed filesystem integrity, when faulty hardware appears to be a possibility, you must specify that you have run fsck -f.

(It is possible for the "dirty" state of a journalling filesystem such as ext4, to have been cleared by fsck or the kernel simply replaying the journal. In that case, running fsck without -f will simply report that the filesystem has been marked "clean", without checking any further.)

Also, you should specify that you're using ext4 for your root filesystem (assuming that's the case). If you use btrfs or maybe XFS, it's not necessarily the case that the traditional fsck command will achieve anything.

(In such cases filesystem-specific commands might be more helpful... or not. Basically, the ext2 line of filesystems originally pre-dates journalling and so developed a fsck which was intended to be able to do a thorough integrity check. Other filesystems have different histories.

btrfs and ZFS are "checksumming filesystems", written to detect errors on their own. It would be reasonable to consider running a "scrub" at this point. This would explicitly check data that has been written by the filesystem against the checksum that was generated at the time. (This excludes certain types of files which opt out, for performance reasons).

For example, if data was corrupted in the process of being transmitted to the disk, a "scrub" would detect this, but a SMART test would not.)

Best Answer

Related Solutions

Linux kernel 3.3 power regression

Related Question