I have a small server with Ubuntu 10.04 on it; I am manipulating this server from another computer via ssh
, and I tried to use nfs
on it to share files. That mostly works, until one of the clients unmounts and I want to shutdown nfs-kernel-server
on the server. While the stopping seems proper:
$ sudo service nfs-kernel-server stop
* Stopping NFS kernel daemon [ OK ]
* Unexporting directories for NFS kernel daemon... [ OK ]
… I do get something like this in the log:
Feb 5 11:50:17 user init: statd main process (3806) killed by KILL signal
Feb 5 11:50:17 user init: statd main process ended, respawning
Feb 5 11:50:17 user init: idmapd main process (3808) killed by KILL signal
Feb 5 11:50:17 user init: idmapd main process ended, respawning
Feb 5 11:50:17 user statd-pre-start: local-filesystems started
Feb 5 11:50:17 user sm-notify[3815]: Already notifying clients; Exiting!
Feb 5 11:50:17 user rpc.statd[3830]: Version 1.1.6 Starting
Feb 5 11:50:17 user rpc.statd[3830]: Flags:
… meaning that some related processes to nfs didn't care about me saying stop, and respawned again. If at this point I try to do sudo service nfs-kernel-server start
(again via ssh), that command freezes, and in /var/log/syslog
I get this:
Feb 5 11:43:55 user mountd[2045]: authenticated mount request from 192.168.0.2:1005 for /media/disk (/media/disk)
Feb 5 11:45:19 user mountd[2045]: Caught signal 15, un-registering and exiting.
Feb 5 11:45:19 user kernel: [27428.148368] nfsd: last server has exited, flushing export cache
Feb 5 11:45:19 user kernel: [27428.148431] BUG: Dentry d0bc8b28{i=1f6,n=} still in use (1) [unmount of vfat sdd8]
Feb 5 11:45:19 user kernel: [27428.148473] ------------[ cut here ]------------
Feb 5 11:45:19 user kernel: [27428.148481] kernel BUG at /build/buildd/linux-2.6.32/fs/dcache.c:670!
Feb 5 11:45:19 user kernel: [27428.148491] invalid opcode: 0000 [#1] SMP
Feb 5 11:45:19 user kernel: [27428.148501] last sysfs file: /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
...
Feb 5 11:45:19 user kernel: [27428.148807] Call Trace:
Feb 5 11:45:19 user kernel: [27428.148824] [<c024c780>] ? vfs_quota_off+0x0/0x20
Feb 5 11:45:19 user kernel: [27428.148838] [<c021d4fc>] ? shrink_dcache_for_umount+0x3c/0x50
Feb 5 11:45:19 user kernel: [27428.148852] [<c020d090>] ? generic_shutdown_super+0x20/0xe0
...
Feb 5 11:45:19 user kernel: [27428.149511] EIP: [<c021d4a9>] shrink_dcache_for_umount_subtree+0x249/0x260 SS:ESP 0068:ccc6de6c
Feb 5 11:45:19 user kernel: [27428.149631] ---[ end trace 6198103bb62887ac ]---
Feb 5 11:49:53 user init: idmapd main process (838) killed by TERM signal
Feb 5 11:49:53 user init: idmapd main process ended, respawning
Feb 5 11:49:53 user rpc.statd[769]: Caught signal 15, un-registering and exiting.
Feb 5 11:49:53 user init: statd main process ended, respawning
Feb 5 11:49:53 user statd-pre-start: local-filesystems started
Feb 5 11:49:53 user sm-notify[3790]: Already notifying clients; Exiting!
Feb 5 11:49:53 user rpc.statd[3806]: Version 1.1.6 Starting
Feb 5 11:49:53 user rpc.statd[3806]: Flags:
...
Now, the thing is this – after this bug happens, the server's ssh
server is (for some reason) usually still "live", so I can log in via ssh
again, and try to close processes (and realize it is impossible to kill /usr/sbin/rpc.nfsd 8
, which is the one hanging).
BUT – if at this point, I try to issue a reboot via sudo shutdown -r now && exit
from ssh, then this server PC will start the reboot process – but will not complete it; it will drop to a terminal, dump some error messages, and stay there :(
The problem is – the server PC is in a really difficult to access location, and having to go there to do Alt+SysRq + REISUB to properly reboot (if the kernel reacts to that key combo; else it's hard powerdown) is really difficult.
So my question is – is there some "hardcore reboot" command in Linux, that will more-less "guarantee" that the PC will reboot (and not just hang/freeze), even if it has encountered a kernel bug – and which I could issue via ssh
? Something that would be the equivalent of a hard powerdown (i.e. turning of the power by e.g. holding the power button for 10+ seconds) and hard powerup?
Best Answer
To ensure that the system will reboot no matter what, I always do this sequence:
This requests the kernel to do:
o
for poweroff.See e.g. here for explanation of this feature.