Linux – How to monitor BTRFS filesystem raid for errors


I saw some documentation on a daemon that can execute a program/script for various BTRFS events, but I cannot find it anymore.

How can I have a script/program be executed on a drive failure for a BTRFS raid1 array? I would like to run a script on any error to act as an early warning for a potentially failing drive, but the actual drive failure is most important. I would like to unmount the filesystem at that point (if that's not what BTRFS does anyway) and set an alarm.

Best Answer

In addition to the regular logging system, BTRFS does have a stats command, which keeps track of errors (including read, write and corruption/checksum errors) per drive:

# btrfs device stats /
[/dev/mapper/luks-123].write_io_errs   0
[/dev/mapper/luks-123].read_io_errs    0
[/dev/mapper/luks-123].flush_io_errs   0
[/dev/mapper/luks-123].corruption_errs 0
[/dev/mapper/luks-123].generation_errs 0

So you could create a simple root cronjob:
@hourly /sbin/btrfs device stats /data | grep -vE ' 0$'

This will check for positive error counts every hour and send you an email. Obviously, you would test such a scenario (for example by causing corruption or removing the grep) to verify that the email notification works.

In addition, with advanced filesystems like BTRFS (that have checksumming) it's often recommended to schedule a scrub every couple of weeks to detect silent corruption caused by a bad drive.

@monthly /sbin/btrfs scrub start -Bq /data

The -B option will keep the scrub in the foreground, so that you will see the results in the email cron sends you. Otherwise, it'll run in the background and you would have to remember to check the results manually as they would not be in the email.

Update: Improved grep as suggested by Michael Kjörling, thanks.

Update 2: Additional notes on scrubbing vs. regular read operations (this doesn't just apply to BTRFS only):
As pointed out by Ioan, a scrub can take many hours, depending on the size and type of the array (and other factors), even more than a day in some cases. And it is an active scan, it won't detect future errors - the goal of a scrub is to find and fix errors on your drives at that point in time. But as with other RAID systems, it is recommended to schedule periodic scrubs. It's true that a typical i/o operation, like reading a file, does check if the data that was read is actually correct. But consider a simple mirror - if the first copy of the file is damaged, maybe by a drive that's about to die, but the second copy, which is correct, is actually read by BTRFS, then BTRFS won't know that there is corruption on one of the drives. This is simply because the requested data has been received, it matches the checksum BTRFS has stored for this file, so there's no need for BTRFS to read the other copy. This means that even if you specifically read a file that you know is corrupted on one drive, there is no guarantee that the corruption will be detected by this read operation.
Now, let's assume that BTRFS only ever reads from the good drive, no scrub is run that would detect the damage on the bad drive, and then the good drive goes bad as well - the result would be data loss (at least BTRFS would know which files are still correct and will still allow you to read those). Of course, this is a simplified example; in reality, BTRFS won't always read from one drive and ignore the other.
But the point is that periodic scrubs are important because they will find (and fix) errors that regular read operations won't necessarily detect.

Faulted drives: Since this question is quite popular, I'd like to point out that this "monitoring solution" is for detecting problems with possibly bad drives (e.g., dying drive causing errors but still accessible).

On the other hand, if a drive is suddenly gone (disconnected or completely dead rather than dying and producing errors), it would be a faulted drive (ZFS would mark such a drive as FAULTED). Unfortunately, BTRFS may not realize that a drive is gone while the filesystem is mounted, as pointed out in this mailing list entry from 09/2015 (it's possible that this has been patched):

The difference is that we have code to detect a device not being present at mount, we don't have code (yet) to detect it dropping on a mounted filesystem. Why having proper detection for a device disappearing does not appear to be a priority, I have no idea, but that is a separate issue from mount behavior.

There'd be tons of error messages in dmesg by that time, so grepping dmesg might not be reliable.
For a server using BTRFS, it might be an idea to have a custom check (cron job) that sends an alert if at least one of the drives in the RAID array is gone, i.e., not accessible anymore...