Linux – Does ZFS for Linux over stress VirtualBox

linuxsatavirtualboxzfs

I've been using MD raid + LVM for many years, but recently decided to take a look at ZFS. In order to try it, I created a VirtualBox VM with a similar layout to my main server – 7 'SATA' drives or various sizes.

I set it up with an approximation of my current MD+LVM configuration and proceeded to work out the steps I needed to follow to rearrange files, LVs, VGs etc, to make space to try ZFS. All seemed ok – I moved and rearranged PVs until I had the space set up over a period of 3 days uptime.

Finally, I created the first ZPool:

  pool: tank
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        sdb1    ONLINE       0     0     0
        sdc1    ONLINE       0     0     0
        sdd1    ONLINE       0     0     0
        sde1    ONLINE       0     0     0
        sdg1    ONLINE       0     0     0

errors: No known data errors

I created a couple of ZFS datasets and started copying files using both cp and tar. E.g. cd /data/video;tar cf - .|(cd /tank/video;tar xvf -).

I then noticed that I was getting SATA errors in the virtual machine, although the host system shows no errors.

Apr  6 10:24:56 model-zfs kernel: [291246.888769] ata4.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x6 frozen
Apr  6 10:24:56 model-zfs kernel: [291246.888801] ata4.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:24:56 model-zfs kernel: [291246.888830] ata4.00: cmd 61/19:50:2b:a7:01/00:00:00:00:00/40 tag 10 ncq 12800 out
Apr  6 10:24:56 model-zfs kernel: [291246.888830]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:24:56 model-zfs kernel: [291246.888852] ata4.00: status: { DRDY }
Apr  6 10:24:56 model-zfs kernel: [291246.888883] ata4: hard resetting link
Apr  6 10:24:57 model-zfs kernel: [291247.248428] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:24:57 model-zfs kernel: [291247.249216] ata4.00: configured for UDMA/133
Apr  6 10:24:57 model-zfs kernel: [291247.249229] ata4.00: device reported invalid CHS sector 0
Apr  6 10:24:57 model-zfs kernel: [291247.249254] ata4: EH complete

This error occurs multiple times on various different drives, occasionally with a failed command of 'READ FPDMA QUEUED' or (twice) 'WRITE DMA', to the extent that the kernel eventually reports:

Apr  6 11:51:32 model-zfs kernel: [296442.857945] ata4.00: NCQ disabled due to excessive errors

This does not stop the errors being reported.

An internet search showed that this error had been logged on the VirtualBox.org web sites about 4 years ago (https://www.virtualbox.org/ticket/8311) for version 4.0.2 of VirtualBox and was apparently considered fixed, but then reopened.

I'm running VirtualBox 4.3.18_Debian r96516 on Debian (Sid) kernel version 3.16.0-4-amd64 (which is also the guest OS as well as host OS). ZFS is version 0.6.3 for ZFSonLinux.org/debian.html.

I would have thought more work had been done on this in the intervening years as I can't believe I'm the only person to try out ZFS under VirtualBox so would have thought this error would have been identified and resolved especially as versions of both ZFS and VirtualBox are maintained by Oracle.

Or is it simply the case that ZFS stresses the virtual machine to its limits and the simulated drive/controller just can't respond fast enough?

Update:

In the 14 hours since I created a pool, the VM has reported 204 kernel ata erors. Most of the failed commands are 'WRITE FPDMA QUEUED', followed by 'READ FPDMA QUEUED', 'WRITE DMA' and a single 'FLUSH CACHE'. Presumably, ZFS retried the commands, but so far I am wary of using ZFS on a real server if it's producing so many errors on a virtual machine!

Best Answer

These look like generic hdd timeout errors in the guest system. They might be caused by ZFS, but they might just as well be caused by other high i/o operations. As a guest system, Linux is quite sensitive in this regard, as it has a low default timeout (usually 30 seconds). This may not be enough in a vm, especially if the disk image is a regular file and the host system is under load; some writes could take longer than expected if the host's cache is full.

Or, to quote the VirtualBox manual:

However, some guests (e.g. some Linux versions) have severe problems if a write to an image file takes longer than about 15 seconds. Some file systems however require more than a minute to complete a single write, if the host cache contains a large amount of data that needs to be written.

Note that this is not limited to VirtualBox. Other virtualization solutions may show the same behavior when running a Linux guest.

As for the timeout itself: The Linux hdd timeout (leading to ata exceptions and possibly corruption under high load) can be increased in the guest system.

For example, on Debian 7, all you need to do is add a few lines to your /etc/rc.local:

$ cat /etc/rc.local 
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

TIMEOUT=86400
for f in /sys/block/sd?/device/timeout; do
    echo $TIMEOUT >"$f"
done

exit 0

Then grep for ata exceptions to see if they're gone:

# grep -Rn --col 'ata.*exception' /var/log/

However, it would be preferable to increase the vm's disk performance rather than having to change the timeout of the guest system. In the case of VirtualBox, the "Host I/O Cache" of the vm's virtual storage controller can be disabled. If enabled, the host cache could be the bottleneck and slow disk operations down if there's a lot of disk i/o on the host. On the other hand, disabling it might increase the load on the vm itself so timeouts might still occur if the guest is overloaded, so enabling the host cache might even be better in some cases, depending on your workload.

If this does not help, the VirtualBox manual also recommends experimenting with the flush interval:

For IDE disks use the following command:

VBoxManage setextradata "VM name"
  "VBoxInternal/Devices/piix3ide/0/LUN#[x]/Config/FlushInterval" [b]

For SATA disks use the following command:

VBoxManage setextradata "VM name"
  "VBoxInternal/Devices/ahci/0/LUN#[x]/Config/FlushInterval" [b]

Values between 1000000 and 10000000 (1 to 10 megabytes) are a good starting point. Decreasing the interval both decreases the probability of the problem and the write performance of the guest.

In some tests, VirtualBox guest systems have experienced such hdd timeouts (crashing the vm and/or causing corruption) no matter if host i/o caching was enabled or not. The host filesystem was not slow, except for half a minute whenever a scheduled cron job would run, causing those timeouts in the vm. It was only after setting the hdd timeout as described above that the issue went away and no more timeouts happened.

Best Answer

Related Solutions

Linux – How to (really) disable NCQ in Linux

Understanding the error reporting of ZFS (on Linux)

Related Question