Linux file system for a big file server

ext4filesystemslinux

I would like to know, from more experienced people, what would be the best choice of file system to use for a file server having more than 20TB of hard disks.
Personally I always used EXT3 (back in the days) and EXT4 (since available) [and once ReiserFS 3 though it caused many data corruption] on my personal computers and on the "little servers" BOOT and ROOT disks.

However as EXT4 tools (though not EXT4 itself) is limited to 16TB partitions this may not be my best bet.
Distribution will be Debian 6.0 (Squeeze) and / or Gentoo (latest version), so kernel should be pretty recent (on Debian at least with backports), meaning linux kernel >= 2.6.32.

File server will be used for maily three purposes (and separated partitions as well, because the purpose is to keep data "safe" and don't really care much of overhead).
All disks are though to be encrypted using LUKS:

  1. Media, downloads and local debian repository [I have at least 6 machines running Debian] >20TB (maybe further separation between Media, Downloads and Debian repository)
  2. Data (Documents, Photos, …) ~ 4TB SAFE (meaning raid1 or raid6 + backup disk)
  3. Backups >= 20 TB for backups of other computers in my gigabit lan (can you suggest a software that backups the entire OS even if it's windows, BackupPC says it does that, any alternatives ?)

Fast speeds are not really necessary (concurrent accesses: maximum to 2 or 3 large files, say videos), even if it's "just" 200MB/s reads from a 10 HDD Raid6 I can live with that.

In summary I look for a reliable, scalable (i.e. easily expandable) filesystem that supports more than 20TB / partition. The safer and reliable the FS is, the better. Hardware employed will be at least a quad core (amd x4 630 or intel i5-2500k) and plenty of RAM (>8GB, maybe >16GB) so hardware requirements should be met.

My PCs / Server will be connected to an UPS (Uninterrupted Power Supply) in case of power outage
Might do media and backups on separate machines as well (i.e. two servers).

Best Answer

Alot of people are suggesting ZFS. But ZFS is not available natively under Linux except through fuse. I wouldn't recommend this for your situation where performance is likely to be important.

Unfortunately, ZFS will never be available as a native kernel module unless licencing issues are sorted out somehow.

XFS is good, but some people have reported corruption issues and I can't really comment on that. I've played with small XFS partitions and not had these problems but not in production.

ZFS has too many advantages & useful features that cannot be ignored though. In summary they are (see ZFS Wiki for a full description of what they mean):

  • Data integrity
  • Storage pools
  • L2ARC
  • High capacity
  • Copy on write
  • Snapshots & clones
  • Dynamic striping
  • Variable block sizes
  • Lightweight filesystem creation
  • Cache management
  • Adaptive endianness
  • Deduplication
  • Encrypion

So how do we get around it? My suggested alternative which may suit your situation is to consider nexenta. This is an Open Solaris kernel with GNU userland tools running on top. Having an Open Solaris kernel means having ZFS available natively.

Related Question