Working with a fanless Debian based machine. All filesystems are on an sd card.
The /var partition is a separate ext2 fs entry in /etc/fstab.
The system doesn't have an 'on/off' switch so people tend to yank the plug to power cycle it. This leads to corruption on the /var partition.
I want to force the system to run e2fsck at every boot.
What I've tried:
-
Don't mount /var at boot. Add script in /etc/rc2.d to run e2fsck and then mount the drive.
Problem: This gives me a system which thinks it's stuck at runlevel 6. See here. -
Use tune2fs to set the fsck cycle to one mount.
Problem: system often hangs during boot noting that /var is already mounted and drops to maintenance shell. -
Set 6th bit in /etc/fstab to 2. Run
touch /forcefsck
.
Problem: neither / both has any noticeable effect. Disk is not checked. -
Add noauto to /etc/fstab (see #1 above).
Problem: System still mounts partition so error message still pops up.
Suggestion(s) on other things to try?
EDIT:
Some background:
- We have 150+ of these systems deployed in remote locations
- Systems in question do not have power on/off switches
- Systems are often (erroneously) put on switched power sources (wall switches or other)
- Loss of power to location in question is not uncommon
Best Answer
This question has already been answered:
How to force fsck at every boot - all (relevant) filesystems?
No one pointed out on there that the real problem is people yanking the cable. I seriously think the focus on BOTH questions is wrong; You need to fix your user problem, not the server filesystem problem.
Honestly, given how crucial this filesystem is to the basic functionality of the machine, your best bet is to get out of thinking about this problem like a sys admin and start thinking about it like a manager.
In other words:
What is it about this machine that makes it so unstable they think it's necessary to reboot it ?? It's a debian system. They don't need "reboots", so what else is wrong with it ?? Are they worried about power consumption or are there services that are broken and unstable on it that only a reboot can solve ? If it's the latter, then your question is irrelevant and you have other work to do, sorry to say.
If nothing else, you could approach your suggestion to be good to it and not reboot by pulling the cable as an exercise in energy conservation. Do you really want to get up from your desk to pull a power cable rather than just sit there, login, and reboot it on the command line ?? It takes like 2 seconds of work to do it that way, versus getting up, grumbling the entire time all the way to the device, yank the cable, plug it back in, wait for it to come back up broken, and then have to wait even longer for /var to be fscked.
The get up-yank cable-wait for /var to fix it self cycle takes far longer, is far more complex to maintain in the long run, will cause all kinds of pain on your part, has already motivated you to ask the wrong questions, and will ultimately lead to you at the top of a bell tower with a love weapon and a death wish.
Fix it right, by fixing your users or mitigate the damage by making it extremely challenging for them to accomplish stupid. I can't be more clear on the importance of this.