Debian – Force e2fsck on /var at every boot

debian

Working with a fanless Debian based machine. All filesystems are on an sd card.

The /var partition is a separate ext2 fs entry in /etc/fstab.

The system doesn't have an 'on/off' switch so people tend to yank the plug to power cycle it. This leads to corruption on the /var partition.

I want to force the system to run e2fsck at every boot.

What I've tried:

  1. Don't mount /var at boot. Add script in /etc/rc2.d to run e2fsck and then mount the drive.
    Problem: This gives me a system which thinks it's stuck at runlevel 6. See here.

  2. Use tune2fs to set the fsck cycle to one mount.
    Problem: system often hangs during boot noting that /var is already mounted and drops to maintenance shell.

  3. Set 6th bit in /etc/fstab to 2. Run touch /forcefsck.
    Problem: neither / both has any noticeable effect. Disk is not checked.

  4. Add noauto to /etc/fstab (see #1 above).
    Problem: System still mounts partition so error message still pops up.

Suggestion(s) on other things to try?


EDIT:

Some background:

  1. We have 150+ of these systems deployed in remote locations
  2. Systems in question do not have power on/off switches
  3. Systems are often (erroneously) put on switched power sources (wall switches or other)
  4. Loss of power to location in question is not uncommon

Best Answer

This question has already been answered:

How to force fsck at every boot - all (relevant) filesystems?

No one pointed out on there that the real problem is people yanking the cable. I seriously think the focus on BOTH questions is wrong; You need to fix your user problem, not the server filesystem problem.

Honestly, given how crucial this filesystem is to the basic functionality of the machine, your best bet is to get out of thinking about this problem like a sys admin and start thinking about it like a manager.

In other words:

  • Teach your users how to properly reboot this system to prevent the /var corruption problem to begin with. Documentation is your friend as they say. This is not an ideal solution for multiple reasons but at least it keeps them from frying filesystems. If nothing else, they shouldn't be touching the damn thing at all if it's your job to keep it running.
  • Lock it away where they can't get to it. Seriously, if this is a server storing important data why is this not already the case? Is this a development system and the developers just don't know what they're doing or how damaging this can be ?? If so, again, teach them. It's not your job to fix stupid, it's your job to prevent stupid.
  • Tell them to leave it the hell alone and come talk to you if there's problems. :)
  • Lo-tech, but possibly helpful (though a fire hazard to be sure): Duct tape the shit out of both ends of the power cable so they have to spend 15 minutes trying to untape it. Hopefully, after five minutes and on layer 26 of tape they'll get frustrated and do what they're supposed to do: Talk to you to fix the actual problem that's motivating them to pull the plug in the first place.

What is it about this machine that makes it so unstable they think it's necessary to reboot it ?? It's a debian system. They don't need "reboots", so what else is wrong with it ?? Are they worried about power consumption or are there services that are broken and unstable on it that only a reboot can solve ? If it's the latter, then your question is irrelevant and you have other work to do, sorry to say.

If nothing else, you could approach your suggestion to be good to it and not reboot by pulling the cable as an exercise in energy conservation. Do you really want to get up from your desk to pull a power cable rather than just sit there, login, and reboot it on the command line ?? It takes like 2 seconds of work to do it that way, versus getting up, grumbling the entire time all the way to the device, yank the cable, plug it back in, wait for it to come back up broken, and then have to wait even longer for /var to be fscked.

The get up-yank cable-wait for /var to fix it self cycle takes far longer, is far more complex to maintain in the long run, will cause all kinds of pain on your part, has already motivated you to ask the wrong questions, and will ultimately lead to you at the top of a bell tower with a love weapon and a death wish.

Fix it right, by fixing your users or mitigate the damage by making it extremely challenging for them to accomplish stupid. I can't be more clear on the importance of this.

Related Question