How to achieve maximum sustained sequential disk write performance

hard drivehardware-raidperformanceraidstreaming

I need data write rates of ~1 GB/sec sustained for ~1 hour. Data is coming in over this PCIe x4 frame grabber. I need to stream its full bandwidth to disk.

I don't have experience with RAID, but as best I can tell, RAID 0 with as many high RPM disks as possible is the answer. I also gather that discrete RAID controllers are much faster and more reliable than any built-in to motherboards.

For the sake of a specific starting point for concrete answers, my initial guess is that the following hardware will be a good system for this task:

  • RAID controller: LSI MegaRAID 9280-16i4e
  • HDD's: 11x Western Digital Black Caviar 2 TB SATA III 7200 RPM 64 MB
  • Cables: 3ware CBL-SFF8087OCF-10M SFF-8087 Serial ATA Breakout Cable
  • Motherboard: Gigabyte GA-Z77X-UD3H LGA 1155 Intel Z77
  • Power supply: Silverstone Strider Gold Evolution SST-ST1200-G 1200W v2.3 80 PLUS GOLD
  • Case: Rosewill RSV-L4411 4U case 12 hot swap bays

My question is: how do I achieve maximum sustained sequential disk write performance?

A good answer will address the following:

  • What features/specs do I need to look for in the RAID controller and HDD's for fastest sequential writes?
  • Will write speed be independent of the CPU? (IE, how do i ensure using DMA?) Is there a way for the data path to even bypass RAM? Would quad vs dual channel RAM matter?
  • Is there any bottleneck to look out for on the motherboard, ie the north/south bridges? If so, how would I detect/avoid such a problem?
  • In sustained sequential writing, are any caches (on the controller, HDD's, CPU, etc) relevant?
  • How do I ensure the PSU is adequate for all these drives? I understand I might have to worry something about amperage draws on the rails? Will inadequacies here show up as performance problems/random crashes or will it just clearly work/fail?
  • Same question as above, regarding cooling.
  • Would there be an advantage to using an external drive enclosure? Does connecting to them impose a bottleneck?
  • What BIOS settings are important for this application? AHCI, etc?
  • What filesystem is best? The camera/frame-grabber drivers are all Windows, so I'm stuck in win7. I assume 64 vs 32 bit will improve bandwidth?
  • What tuning should i expect to have to do?

A previous version of this question was removed for being "too broad":

"There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs."

But my question is very specific, of general interest, and I have provided details that allow an efficient answer in a single paragraph, not "a whole book." All my detailed questions merely ensure that answers are comprehensive regarding the potential bottlenecks that anyone should be concerned with for this single problem: fast sustained sequential writes. It wouldn't be useful to anyone to break up the question into 32 separate questions, as user 50-3 suggested. Here is an example response that shows the form of what I'm expecting (I have no idea if the actual information is correct, it is my best guess):

  • RAID 0 with high RPM disks is indeed the way to achieve fastest sustained sequential writes (assuming you are using your frame grabber's "stream" mode). SSD's aren't good for this because they dramatically slow down their write time with usage due to processing required for "leveling" (preventing any one location from being used more than others).
  • To sustain 1GB/sec indefinitely, you need >3 7200RPM 6Gb/sec SATA drives (6Gb/sec * 1/8 GB/Gb = .75 GB/sec/drive with no headroom). More drives will improve your bandwidth headroom linearly, but saturate after the data width of your bus (32 or 64).
  • SATA is the most cost-effective HDD technology, SAS doesn't have appreciable advantages for fast sequential writes. SAS is better for seek times to random locations and reliability. The faster RPM in SAS would increase sequential write speed, but is counteracted by lower density/capacity.
  • Any decent drivers for frame grabbers/RAID cards use DMA (the ones you mention do), so CPU won't matter. The data path will always include system memory. Writing to disk will be much slower than your RAM, so you don't need anything exotic (any DDR3 is fine). The amount of RAM (and size of caches on controller, HDD's, CPU) does not matter, because buffers quickly fill during sustained writes.
  • The north/south bridge on any PCIe 2.0 motherboard won't be bottlenecks. All you need is a discrete RAID controller >= PCIe 2.0 that has enough SATA connections for the drives you have. External connections to an enclosure are a bottleneck only if using expanders causing drives to share bandwidth. You want a card with more PCIe lanes than the 4 on the frame grabber so the PCIe bus won't be a bottleneck. The 9280 will be fine, but is a lot of overkill for your purpose; a 9240 8i would be less than half the cost and adequate. LSI controllers are among the most expensive but tend to be faster/more reliable/less hassle during error recovery than cheaper brands Highpoint/Areca.
  • You need a PSU with enough wattage for all your drives and the controller (the 9280 uses 15W and each WD uses 10W). Each drive has a peak draw of ~1A current and you need to limit the number on each circuit ("rail") of the PSU. The 1200G has one rail with 100A, so you won't have a problem. Overdraws would show up as random hard crashes (possibly damaging the drives and other components), same for overheats.
  • The cooling built in to a case made with 12 hot swap bays should be adequate for near-constant loads of non-sequential reads, which produce more heat than your sequential writes. To be sure you don't need additional cooling, monitor temp (google HDDTemp) after many minutes of sustained writes.
  • AHCI is the only BIOS setting relevant to fast sequential writes (turn on SMART too). Set both of these before installing Windows.
  • Windows' NTFS file system will be fine (there's no alternative anyway).
  • You will have better sequential write performance with win64 vs win32 because the DMA bandwidth to the raid controller will be twice as big.
  • You shouldn't have to do any tuning; the default block size, etc set up by your raid controller should be adequate. Bigger blocks would be faster, but more susceptible to corruption and unnecessary.

If you still consider this question "too broad," please specify exactly why and suggest how it could be narrowed, while still providing a thorough answer for people interested in achieving maximum sustained sequential write performance. This question belongs on Superuser more than Serverfault because it is not specific to corporate IT.

Best Answer

You made a very long list which I am not going to answer one by one. However I want to make these things very clear:

1) PCI can not sustain those speeds. PCI express can, it is a totally different technology with point to point links (called lanes) instead of a shared bus. The card you linked to is " PCIe x4". The extra e is very much relevant.

2) Stripes (RAID 0, RAID10 etc etc) is quite possible. Either with a dozen high performance disks. Or you could use normal disks. An office corner shop, bog standard 7200 RPM SATA drive will do about 100MB/sec. So you would need at least a dozen of these (since things never scale quite perfect).

3) Both HW RAID, software RAID and Fake RAID (software RAID with BIOS supports, e.g. Intel IRSST) will work.

Software RAID is not recommended if you do not to do a lot of calculations (e.g. RAID6) and need high performance or have a slow CPU.

Hardware RAID will vary. A good HW RAID card is great. A bad one might perform quite poorly compared to a good SW RAID solution. Good HW RAID often needs battery backed cache or flash to enable the fast modes.

4) SATA II or III (3.0 or 6.0 GB/sec) or SAS 3GBIT/sec, SAS 6GBIT SEC, ... does not matter. And individual spinning disk will not saturate any of these links. Current consumer SATA drives max out around 100MB/sec. High end enterprise SAS drives can get up to 200MB/sec. Both speeds are lower than 3.0GB/sec.

5) RAID0 is not very safe. If one disk fails, you loose all. This might be acceptable if you just need to test things and save the data. And them immediately save it somewhere safe of process it. However a the more disks you use the more disks can fail.

RAID is usually about redundancy. RAID0 is not, it is solely about performance.

6) Lastly for completeness sake: SSD is not inherently bad for this. For this much data they will be expensive and possibly not needed, but an SSD does not need to slow down. Just completely wipe the SSD (e.g. delete all partitions, or secure erase it) before you add it to recording array. Once it is full it may slow down. But properly prep it and run it for one session and it should be fine.

7)

AHCI is the only BIOS setting relevant to fast sequential writes (turn on SMART too).

You can not turn SMART on or off. It is always on on the drive. The option in the BIOS just means 'read the drives SMART data when you POST and if there is anything wrong then warn the user. Usually with a single line like 'SMART: DISK FAILURE IMMINENT. Press F1 to continue!". It has no performance influence.

Set both of these before installing Windows.

For consistent performance: Install the OS on its own drive. Keep separate volumes for OS and for data.

8)

T sustain 1GB/sec indefinitely, you need >3 7200RPM 6Gb/sec SATA drives (6Gb/sec * 1/8 GB/Gb = .75 GB/sec/drive with no headroom).

No.

A 6GBit/sec data link SATA drive will be able to transfer roughly 300MiB/sec between disk and controller/RAID card. (6.0 divided by 8 for bit-to-bytes, but there is also some overhead and a /10 is more realistic).

Secondly the drive will be able to receive the data quite quickly, but writing it to a disk will be slower. A realistic value for a modern 7200 RPM SATA drive is 100MiB/sec sustained write.

That means you need at least 10 such drives. And only if everything scales perfectly.

More drives will improve your bandwidth headroom linearly, but saturate after the data width of your bus (32 or 64).

True for PCI. But despite writing PCI the OP meant PCI-e, which is a lot faster. 4 lanes PCI-e v2 is up to 10Gbit/sec. That should be enough (though there is not much headroom).

Related Question