Linux – Changing HDD State from ReadOnly After a Temporary Crash

linuxmountreadonly

At this time no ansver for this problem.

Usually after some problems with readings or writings to block device, kernel decides to switch flag for WHOLE DEVICE as read-only. After this any writings to any partition / filesystem located on this device cause switch it as readonly together with device state, because any writings are impossible.

Example from dmesg, this is simulation for guest linux on windows8 using VirtualBox when defrag takes guests device image:

[11903.002030] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11903.003179] ata3.00: failed command: READ FPDMA QUEUED
[11903.003364] ata3.00: cmd 60/08:00:a8:77:57/00:00:00:00:00/40 tag 0 ncq 4096 in
[11903.003385]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11903.004074] ata3.00: status: { DRDY }
[11903.004248] ata3: hard resetting link
[11903.325703] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11903.327097] ata3.00: configured for UDMA/133
[11903.328025] ata3.00: device reported invalid CHS sector 0
[11903.329664] ata3: EH complete
[11941.000472] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11941.000769] ata3.00: failed command: READ FPDMA QUEUED
[11941.000952] ata3.00: cmd 60/08:00:c8:77:57/00:00:00:00:00/40 tag 0 ncq 4096 in
[11941.000961]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11941.001353] ata3.00: status: { DRDY }
[11941.001504] ata3: hard resetting link
[11941.320297] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11941.321252] ata3.00: configured for UDMA/133
[11941.321379] ata3.00: device reported invalid CHS sector 0
[11941.321553] ata3: EH complete
[11980.001746] ata3.00: exception Emask 0x0 SAct 0x11fff SErr 0x0 action 0x6 frozen
[11980.002070] ata3.00: failed command: WRITE FPDMA QUEUED
[11980.002255] ata3.00: cmd 61/18:00:28:23:59/00:00:00:00:00/40 tag 0 ncq 12288 out
[11980.002265]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
-------------------
There are many other errors, like "lost write page", "Journal has aborted", "Buffer I/O error", "hard resetting link" and many others.

After this, remount cause:

mount / -o remount,rw
mount: cannot remount block device /dev/sda1 read-write, is write-protected

because WHOLE device sda keeping rootfs sda1 is READONLY.

In my experience this occurs in situations:

  1. HDD is really damaged. Returned writing problems are depended on HDD condition
  2. Host machine is overloaded, then linux guest virtual HDD writings are timeouted
  3. FC cable or SAN device (array disks over Fibre Channel) is overloaded
  4. Momentary lost connection over FC or FCoE. Maybe lost/timeouted FC packet

At this situations device is really read-write, but linux kernel marks this device internally as read-only and is used as read-only. This is kernel functionality maked for damage prevention, but it is useable only at 1. point.

Question is. How to manually tell to kernel, hdd block device operates normally?

Witiout this, kernel serve device as read-only, like 'CD-ROM', and no other command has chance to works properly, including mount/remount -o read-write , fsck and others.

Unusable ansvers, really qualified as spam from people who wants to help, but doesn't understand about problem nature:

  1. Try remount as read-write (impossible, device is R-O)
  2. fsck this (what for? device is R-O, no repair is possible)
  3. 'I don't know' (first with sense, but unusable)
  4. 'Replace your device' *(usually the problem is something else)

Has anybody any formula for question above? Switch flag for writeable block device that reverts it from read-only to read-write state ?
At this time it seems that no-one know how.

It is some workarounds, but usually semiusable or unusable:

  1. Remove module supports access to specified hdd or storage array. Unfortunately usually damaged device keeps rootfs, or driver keeps both damaged device and device that keeps rootfs
  2. Remove FC access to device and join this again (fctools), not allways possible, not allways works.
  3. Restart WHOLE machine. Usually only this is allways possible and we allways forced to.

At points 1. and 2. we tell to kernel that we completly disconnect device and connect to it again. Kernel recognized this as joining new properly operatings device. We can simulate this using USB device and momentary remove power. Point 3. is last chance and usually works. But why we should restart all?
Unfortunately at all points we lost all journals updates and dirty buffers.

Notice, at the same situations I have no problems with Windows (desktop and server).

Best Answer

try with blockdev --setrw or hdparm -r 0

Related Question