Linux Arch on ARM – Random USB drive “freeze”

arch linuxarmfreezelinuxusb-storage

I'm running a Linux Box on a Seagate Free Agent Dockstar, very limited machine but more than able to do what I need, which is true most of the times…

I have the operating system on a flash drive and use an external USB 2 "classic magnetic" Western Digital 1,5TB hard disk for massive storage.

Not seldom it happens that the wait for IO % suddenly goes up to almost 100% and the system is on its knees, to the point that is very difficult to even ssh in it; a typical 'iostat -x' in those situations gives output like:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.50 0.00 2.00 8.00 14.80 91400.00 0.00 91400.00 2000.00 100.00

where sdb is the flash drive and sda is the usb disk. This tells me that the USB drive is 100% busy but almost no one is writing or reading from it.

I also used a 'lsof +D ' during "normal" use and nothing suspicious is found: a fair amount of file is used, but nothing strange.

How can I debug deeper? Keep in mind that the machine uses an ARM processor,only has 128MB of RAM and has no screen or local console, but, given this limits, I can install almost everything if needed.

Edit: I also tried to run smartctl which says the drive is fit:

SMART overall-health self-assessment test result: PASSED

… there is a lot of output but none of that seems useful

Edit2:

I really think that the drive has hardware issues; I noticed that when it is busy 100% without load it makes a repetitive noise which reminds me of the old day Amiga floppy disks – they made a similar noise like they were going around without a goal…

For this reasons the suggestion to move /var and /tmp to the magnetic disk only worsened the situation.

I guess the only way to solve this is to buy a new hard drive and backup valuable data ASAP. 🙁

Best Answer

Double-check that the WD drive is OK. I have just detected that one of mine has read errors. It was taking enourmus amounts of time (minutes) to read some bad sectors. Unfortunately SMART may not work over USB, which makes checking the drives condition hard.

One way to check that is reading the whole disk using dd:

dd if=/dev/sdX of=/dev/null bs=1M

That will take quite some time over USB, but if the dd command errors then you know the disk is broken. You can read the disk while it is mounted, but be careful with if= and of=!

Related Question