I have a file server running OpenSolaris (mainly for ZFS, which I really love), but since Oracle has abandoned it, I'm starting to think about what I'm going to do with the OS on that machine. Assuming you have a machine like this and you're in a similar situation, what are your plans? Switching to FreeBSD? Holding out to see if Illumos takes off? Nexenta? I'm curious what other people are thinking on this.
What are you going to do with your OpenSolaris machines
solarisupgradezfs
Related Solutions
Putting your zpool as files on an existing file system means you're relying on that file system to provide consistency (which sounds dangerous at best) and also that ZFS can't take good advantage of caching. I'm not sure how well ZFS would handle the transfer from files to physical devices; the file system itself probably wouldn't have any real complaints, but you might run into things like it not liking vdevs going onto smaller devices (from what I've read, a number of people have been bit by this having set autoexpand=on
, so you might want to be careful with that property and its cousin autoreplace
). Alternatively, you'd be running ZFS on top of LVM, which is probably possible but doesn't allow ZFS to handle the devices intelligently since it'll only see one huge device. Remember that ZFS is not just a file system, it's a volume manager as well, so properly replaces both the regular file system and LVM. Many of its features, including metadata placement on multiple disks and multiple copies of data for redundancy within a zpool, work best when ZFS has a good idea of the physical storage layout.
I've been considering migrating to ZFS as well, and the best option I've been able to come up with for migration involves one more hard disk. Install another hard disk that is at least the size of the smallest physical drive you currently have in the array, make a ZFS pool and file system on it (configured for JBOD but with only one device), and move as much onto it as you can. (Since I'm not running LVM, I'd move everything on the smallest drive onto the ZFS fs.) Reduce the LVM array by removing one physical disk from it, expand the ZFS zpool onto that now-free disk, move some more files, rinse and repeat until done. With clever use of symlinks or good handling of exports, you may even be able to keep the process transparent for anyone who might be using files on that NAS box of yours in the meantime.
For a general overview, see Resolving Problems with ZFS, most interesting part:
The second section of the configuration output displays error statistics. These errors are divided into three categories:
- READ – I/O errors that occurred while issuing a read request
- WRITE – I/O errors that occurred while issuing a write request
- CKSUM – Checksum errors, meaning that the device returned corrupted data as the result of a read request
These errors can be used to determine if the damage is permanent. A small number of I/O errors might indicate a temporary outage, while a large number might indicate a permanent problem with the device. These errors do not necessarily correspond to data corruption as interpreted by applications. If the device is in a redundant configuration, the devices might show uncorrectable errors, while no errors appear at the mirror or RAID-Z device level. In such cases, ZFS successfully retrieved the good data and attempted to heal the damaged data from existing replicas.
Now, for your questions:
First, what does "device" mean in this context? Are they talking about a physical device, the vdev or even something else? My assumption is that they are talking about every "device" in the hierarchy. The vdev error count then probably is the sum of the error counts of its physical devices, and the pool error count probably is the sum of the error counts of its vdevs. Is this correct?
Each device is checked independently and all its own errors are summed up. If such an error is present on both mirrors or the vdev is not redundant itself, it propagates upwards. So, in other words, it is the amount of the errors affecting the vdev itself (which is also in line with the logic of displaying each line separately).
But what I am really interested in is whether there have been checksum errors at ZFS level (and not hardware level). I am currently convinced that CKSUM is showing the latter (otherwise, it wouldn't make much sense), but I'd like to know for sure.
Yes, it is the hardware side (non-permanent stuff like faulty cables, suddenly removed disks, power loss etc). I think that is also perspective: faults at the "software side" would mean bugs in ZFS itself, so unwanted behavior that has not been checked for (assuming all normal user interactions are deemed correct) and that is not recognizable by ZFS itself. Fortunately, they are quite rare nowadays. Unfortunately, they are also quite severe much of the time.
Third, assuming the checksum errors they are talking about are indeed the checksum errors at the ZFS level (and not hardware level), why on earth do they only show the count of uncorrectable errors? This does not make any sense. We would like to see every checksum error, whether correctable or not, wouldn't we? After all, a checksum error means that there has been some sort of data corruption on the disk which has not been detected by hardware, so we probably want to change that disk before as soon as there is any error (even if the mirror disk can still act as "backup"). So I possibly did not understand yet what exactly they mean by "uncorrectable".
Faulty disks are already indicated by read/write errors (for example, URE from a disk). Checksum errors are what you are describing: a block was read, its return value was not deemed correct by the checksums of the blocks above it in the tree, so instead of returning it it was discarded and noted as an error. "Uncorrectable" is more or less definition, because if you get garbage and know that it is garbage, you cannot correct it, but you can ignore and not use it (or try again). The wording might be unnecessarily confusing, though.
According to that paragraph, there could be two sorts of errors: Data corruption errors and device errors. A mirror configuration of two disks is undoubtedly redundant, so (according to that paragraph) it is no data corruption error if ZFS encounters a checksum error on one of the disks (at the ZFS checksum level, not the hardware level). That means (once more according to that paragraph) that this error will not be recorded as part of the persistent error log.
Data corruption in this paragraph means some of your files are partly or completely destroyed, unreadable and you need to get your last backup as soon as possible and replace them. It is when all of ZFS' precautions have already failed and it cannot help you anymore (but at least it informs you about this now, not at the next server bootup checkdisk run).
For me, the main reason for switching to ZFS was its ability to detect silent bit rot on its own, i.e. to detect and report errors on devices even if those errors did not lead to I/O failures at the hardware / driver level. But not including such errors in the persistent log would mean losing them upon reboot, and that would be fatal (IMHO).
The idea behind ZFS systems is that they do not need to be taken down to find such errors, because the file system can be checked while online. Remember, 10 years ago this was a feature that was absent in most small-scale systems at the time. So the idea was that (on a redundant config of course) you can check read and write errors of the hardware and correct them by using good known copies. Additionally, you can scrub each month to read all data (because data not read cannot be known to be good) and correct any error you find.
It is like a big archive/library of old books: you have valuable and not so valuable books, some might decay over time, so you need a person that goes around each week or month and looks at all pages of all books for mold, bugs etc., and if he finds anything he tells you. If you have two identical libraries, he can go over to the other building, look at the same book at the same page and replace the destroyed page in the first library with a copy. If he would never check any book, he might be in for a nasty surprise 20 years later.
Best Answer
I saw this coming about 6 months ago when I started playing with ZFS. At the time, the next release of OpenSolaris was already way overdue, and I had yet to be impressed by the progress between any two releases I'd seen over the years I'd been watching the project. It was clear to me that OpenSolaris wasn't winning hearts and minds, so by the time I started my ZFS adventure, it was already out of the running as far as I was concerned.
I ended up picking FreeBSD 8, for a bunch of reasons:
Really, actually free-as-in-beer. As opposed to the "free only if you need less than 4 TB" come-on offered by Nexenta.
Really, actually free-as-in-freedom.
I learned this lesson back when transitioning from Novell (!) UnixWare to this new cool thing called Linux. I spent an unholy number of hours on Usenet arguing with the UnixWare fans, trying to convince them that Linux was going to take over the *ix world. The UnixWare fans would keep pointing out this little advantage or that; and they were right, UnixWare did have some technical advantages at the time. They saw me surfing a wave behind their serious yacht and only saw the surfboard, and didn't realize that it would take a tsunami for me to be surfing that far out. A decade later, we're still skimming the splinters of their yacht off the ocean surface.
I believe the same thing will happen to Solaris, sooner or later. I have more than just SCO in mind when I make this prediction. Think on how many other marginal Unices have already fallen and how embattled the remaining biggies — AIX and HP-UX — already are.
My only bit of unease here is that I'm not sure that FreeBSD won't also end up under the waves. It could be that Linux + btrfs will someday take the ZFS advantage away from FreeBSD. But, if it does, you still have the two freedoms to fall back on. Consider the least popular of the BSDs, NetBSD: even complete marginalization isn't enough to destroy a truly free OS.
It ran on hardware Nexenta wouldn't. This is more than just an anecdote, it's a reflection of the fact that some OSes get more driver support than others. If this is your only criterion, of course, you'd run Windows, not *ix of any sort. Since it isn't, you have to ask how far down the slope toward "truly rotten driver support" you'll allow yourself to slide. Windows > Linux > FreeBSD > Solaris, in the driver support area.