BSD people is really hardcore and does often surprising things :-) Removing the block device layer is in my opinion not a problem (for example, nfs also doesn't even have an underlying block device), but this reasoning is not against the block devices, but against the write caching. And removing the write cache is on my opinion a bad, very bad thing. If your process writes something to the disk, you don't get back the control until it didn't succeed?
But I don't think that they didn't know what they do. Hopefully somebody will explain their reasons in another answer.
To explain this clearly, I need to explain, how the filesystems work. A filesystem driver is essentially a translation layer between the filesystem operations (directory open, file creation, read-write, deletion, etc) and between the block operations (for example: "write out the page 0xfce2ea31 to the disk block 0xc0deebed").
But the block operations don't reach the hard disk on the spot. First, they are going to the block cache. Which means, if the filesystem wants to write a memory page into the disk, first it writes into a reserved memory area. The memory management of the kernel will write this data out into the hard disk, if it thinks it is optimal. This enables various speed improvements: for example, is many write operation happens to the beginning and to the end of the disk, the kernel can combine them on a such way, that the disk head must reposition itself so seldom, as possible.
There is another improvement: if your program writes into a file, it will experience a so fast operation, as if it would be a ramdisk. Of course it is only possible until the RAM of the system won't be full, after that they must wait for the emptying of the write cache. But it happens only if there is a lot of write operation at once (for example, you are copying large files).
In case of filesystems, there is a big difference between the filesystems which are running on a disk (i.e. block devices) and which isn't (f.e. nfs). In case of the second, there is no possibility for block caching, because there are no blocks. In their case there is a so-named "buffer cache", which essentially means still caching (both read and write), but it is not organized around memory blocks, but I/O fragments of any size.
Yes, in Linux, there are the "raw" block devices which enable the usage of disk devices without the block caching mechanism. But this problem isn't solved by them.
Instead of it, there are the so-named "journaling filesystems". In the case of the journaling filesystems, the filesystem has the opportunity to instruct the kernel, which pages must be written out before others. If there is no journaling mechanism in a filesystem, then it only writes blocks to the disk (more precisely: to the block cache), and the kernel will actually execute the real write operation if it thinks it is optimal.
You can imagine the journaling filesystems as if every write operation would happen twice: first, into a "journal", which is a reserved area on the disk, and only after that to its real location. In case of a system crash or disk error, the content of the last, undamaged state of the disk can very fast and easily reconstructed on the journal.
But this significantly decreases the write performance, because every write must be done twice. This is why in the reality the journaling filesystems work on a much complexer way, they are using various, complex datastructure manipulations to reduce this overhead to a nearly invisible level. But this is hard: for example, the major improvement of ext3 against ext2 was the inclusion of journaling, which multiplied its code size.
In Linux, the block layer API has a "barrier" mechanism. The filesystems can set up "barriers" between their write operations. A barrier means, that data after the barrier will be written into the disk only after every data before the barrier was already written out. The journaling filesystems are using the barrier mechanism to instruct the block layer about the needed ordering of the actual write operations. As I know, they don't use the raw device mapping.
I don't know, what FreeBSD does about the case. Maybe their elimination of the block devices means only that everything will go with buffer cache and not with block cache. Or they have something, which isn't written here. In the filesystem internals, there are very big differences between the *BSD and the Linux world.
Best Answer
The main loop of GNU cat, in the simplest case is (function
simple_cat
fromcat.c
):Then the question becomes "how is
bufsize
set?" The answer is it's usingio_blksize
(insize = io_blksize (stat_buf)
), which is defined as follows:where ST_BLKSIZE gives the operating system's idea of the file system's preferred I/O block size (as accessed using
stat
), and IO_BUFSIZE is defined as 128*1024 (128KB). Here is an excerpt of the Linuxstat
syscall documentation:So it seems that GNU cat will read in blocks of 128KB or the file system's recommended I/O block size, whichever is larger.