Are the size of a memory page and the size of a file system cluster always the same

filesystemslinuxmemorywindows

From http://en.wikipedia.org/wiki/Page_%28computer_memory%29

A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in the page
table. It is the smallest unit of data for memory allocation performed
by the operating system on behalf of a program, and for transfers
between the main memory and any other auxiliary store, such as a hard
disk drive.

From http://en.wikipedia.org/wiki/Data_cluster

In computer file systems, a cluster or allocation unit is a unit of disk space allocation for files and directories. To reduce the overhead of managing on-disk data structures, the filesystem does not allocate individual disk sectors by default, but contiguous groups of sectors, called clusters.

I wonder if the size of a memory page and the size of a file system cluster in the same computer system (hardware and OS, in particular Linux, Windows, Mac) are always the same? Thanks.

Best Answer

Always? No. Often? Yes, which is of course convenient. (Notice no claim for "usually".)

For example, with Windows:

just like in Win32, the x64 page size is 4KB

And for NTFS, the default cluster size is 4KB for disks up to 16TB. But (1) that's just the default; (2) for really large disks, the default is larger; and (3) there are other file systems.

Related Solutions

Memory – Difference Between Sector and Cluster

The advantage of file systems considering a cluster/allocation unit/block as the smallest unit, is because addressing the entire disk per-sector would require a larger number of bits to index it all. This larger number of bits would make it slower, because there are a larger number of addresses and things to keep track of. It's far more efficient to address (and index!) locations using say, 48 bits (2^48 = 2.8e14), as opposed to 64 or more bits (2^64 = 1.8e16) for each single access of the device.

But yes, cluster size or allocation unit size (windows) or block size (Linux) is adjustable depending on the file system defined, and that is the smallest size that can normally be accessed by an OS to store file data. "Defining a filesystem" means to format the disk (or the specifications of that format), so implies erasure of all data on the disk. So on a disk with a cluster size of 4kiB, a 1-byte file would indeed take up an entire 4k cluster as in your example. Yes, the OS could write to some specific sector within that cluster, but the file will still use the same sectors of that cluster (file size will always be a multiple of cluster size, regardless of what data is in it.) To change that cluster size, means to re-format the disk, and is why all data must be erased.

Incidentally, smaller cluster sizes store small files more efficiently. However, the disk will run slower overall as a consequence, because of the increased number of clusters. When your PC is just sitting there grinding on the disk for a long time, this is because it's trying to read or write so many small blocks, and the sheer number of them slows everything down.

Ex: 100,000 768-byte files, stored on a disk with 1kiB clusters:

768kB bytes of actual file data
1.024MB of the disk used, because each file uses 1024 bytes of the disk.
Space efficiency = 0.768/1.024 = 75% (not bad...)

And likewise, larger clusters are better for disks with fewer, larger files on them like movies, images, and audio. Since there are fewer clusters, the disk is generally faster. But be careful putting lots of small files on it:

Ex: 100,000 768-byte files, stored on a disk with 64kiB clusters:

768kB bytes of actual file data
6.55GB of the disk used, because each file uses 65535 bytes of the disk.
Space efficiency = 0.768/6553.5 = 0.00017% !!!

Disks with mixed content, such as an operating system, generally have medium-to-small cluster/block sizes, as most of the files are medium-to-small in size. The end result is a compromise between space utilization and speed.

The disks themselves prefer anywhere from 32kB to 256kB blocks, as that allows them to transfer the most data per second.

This is all concerning traditional mechanical, rotating-platter magnetic-storage hard disks. SSD's or Solid-State Drives are quickly replacing traditional hard disks and boast much faster read/write/seek speeds. So is cluster size important on a SSD today? Well I'd say it is less important to the average user, but only because the SSD (and modern computers) are much faster already. Who is going to notice a SSD slow-down of 10% when already 5x faster than a magnetic hard disk?

What might influence the cluster size on a SSD more is the throughput. You might find (by formatting and benchmarking) that a certain cluster size works far better than others for that SSD. For example, some SSD's are optimized for 8kiB or 4kiB transfers. This has to do with how big a block of data the electronics inside are prepared to transfer per request. Match what the OS is attempting to use (cluster size) with the optimal size for that SSD = fastest transfer speed.

Cluster size is still important for file "overhead" reasons on SSD's however.

I've found a great tool for benchmarking SSD's is AS-SSD for Windows and these on Linux.

Disambiguation: logical / physical block

I understand BLOCK generally means a batch of data seen as a whole I/O unit

The use of the term "block" is widespread in computing, and not restricted to I/O.

What I thought a logical block, or a filesystem block refers to is the minimal I/O unit seen as a whole used by specific filesystem, in order to, as the essence of any batch operation, reduce the overheads brought by reading or writing one sector one time.

IMO it would be careless to make sweeping definitions that encompass all filesystems. Be aware that the I/O blocksize for file data could be different from filesystem metadata. E.G. writes to a file could be consolidated into 4KB (or larger) blocks, but the filesystem journal may need to be written more often (with a smaller block) to ensure data retention.

"Batch operation" is old jargon, and you're using the term in a nonsensical manner.

And a physical block is exactly the synonym of a disk sector.

Only in the context of disk drives.
Magnetic tape requires I/O to be performed in physical blocks, but there is no concept of sectors with tape.

What's more, I believe the word CLUSTER is just filesystem block in Microsoft's fashion, as suggested by this thread on Reddit.

A "cluster" is a unit of allocation in MS filesystems.
Whether I/O is always performed in that same blocksize is questionable. E.G. When the cluster size is 64KB, and the entire file is just 128 bytes, is the filesystem going to write 128 sectors or optimize the I/O to just one sector?

So, is LBA just a fancy word which actually concerns addressing disk sectors?

Essentially yes (for legacy 512-byte sectors).

The integrated controller of the modern disk drive performs the mapping of LBA to physical sector. The actual cylinder, head, and sector that maps to a particular LBA is known only to the drive so that any type of zone-bit recording and relocation for bad sectors can be implemented by the disk drive.

With Advanced Format 512e HDDs that use 4096-byte sectors and a 512-byte transfer size, the term LBA is truly accurate: the address is not of a physical sector, but rather for a logical block consisting of an eighth of a sector.

Or does LBA-compliant disks really understand the concept of filesystem/logical block,

I'm not sure what you mean by "filesystem/logical block", but the answer is probably no.
It's simply a storage device with no concept of organizing the raw data it is storing.

See What kind of api does a sata hard-drive expose?

and is capable of making block-level I/O, thus hide the existence of "sector" from operating system?

The concept of sector (or physical block) cannot be eliminated, simply because that is the minimum unit of I/O. The lowest-levels of the OS (i.e. the device drivers) will always be cognizant of the hardware attributes. But each abstraction layer of the OS will try to obscure those details. So when you read a file, you may not know if it was retrieved from a HDD or DVD or over a network.

FWIW disk controllers (even old ones that used CHS addressing) can perform multi-sector read or write operations, e.g. perform a read of N sequential sectors.

Best Answer

Related Solutions

Memory – Difference Between Sector and Cluster

Disambiguation: logical / physical block

Related Question