Linux – Attempt to test corruption of ZFS filesystem using dd fails

ddlinuxzfs

I'm trying to test error detection and recovery on a system with recently installed ZFS. I deliberately overwrite the start of one of the disks with dd but can't force any errors to be detected.

Can I not use dd to do this? Or am I just not hitting any data?

I've created mirrored zfs pool and copied some data to it:

$ zpool status
  pool: zfspool
 state: ONLINE
  scan: scrub repaired 0 in 0h6m with 0 errors on Sun Dec  1 11:53:12 2013
config:

    NAME                                          STATE     READ WRITE CKSUM
    zfspool                                       ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-WDC_WD10EFRX-68JCSN0_WD-WCC1U4257356  ONLINE       0     0     0
        ata-WDC_WD10EFRX-68JCSN0_WD-WCC1U4299344  ONLINE       0     0     0

errors: No known data errors

I then attempt to corrupt one of the the disks:

$ dd of=/dev/sdb if=/dev/zero bs=512 count=10000
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.103375 s, 49.5 MB/s

and force a read of all data:

$ diff -qr /home/archive/ /zfspool/archive/

I would have thought this should have found the corruption on one of the drives, flag it as invalid and set the pool status to degraded but:

$ zpool status
  pool: zfspool
 state: ONLINE
  scan: scrub repaired 0 in 0h6m with 0 errors on Sun Dec  1 11:53:12 2013
config:

    NAME                                          STATE     READ WRITE CKSUM
    zfspool                                       ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-WDC_WD10EFRX-68JCSN0_WD-WCC1U4257356  ONLINE       0     0     0
        ata-WDC_WD10EFRX-68JCSN0_WD-WCC1U4299344  ONLINE       0     0     0

errors: No known data errors

Ok, let's scrub test it:

$ zpool scrub zfspool
$ zpool status
      pool: zfspool
     state: ONLINE
      scan: scrub repaired 0 in 0h6m with 0 errors on Sun Dec  1 12:46:34 2013
    config:

        NAME                                          STATE     READ WRITE CKSUM
        zfspool                                       ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            ata-WDC_WD10EFRX-68JCSN0_WD-WCC1U4257356  ONLINE       0     0     0
            ata-WDC_WD10EFRX-68JCSN0_WD-WCC1U4299344  ONLINE       0     0     0

    errors: No known data errors

Additional information requested by @rickhg12hs:

$ fdisk -l /dev/sda

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
256 heads, 63 sectors/track, 121126 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      121127   976762583+  ee  GPT
Partition 1 does not start on physical sector boundary.
[17:37:26][root@zserver2:~]$ fdisk -l /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

[17:38:11][root@zserver2:~]$ mount
...
zfspool on /zfspool type zfs (rw,xattr)
zfspool/archive on /zfspool/archive type zfs (rw,xattr)

Best Answer

Errors were detected and fixed with the scrub.

Before that, you didn't attempted any writes, just reads so everything was on the ARC (i.e. in cache on RAM) and disk corruption remained undetected.

I overlook the "0 errors". Here is a corrected explanation about what has likely happened:

You overwrote ~ 5 MB in the beginning of the disk with zeroes.

  • The first 3.5 MB were harmless, ZFS reserves that area for non ZFS stuff so never read or write anything there.
  • The next .5 MB overwrote two vdev labels (out of four)
  • The next 1 MB was written in an area that might not been containing any data or metadata.

The vdev labels corruption went unnoticed due to their high redundancy (there were still six of them healthy) and the fact the labels are atomically overwritten anyway.

Related Question