How do databases deal with file system fragmentation

disk-spacefragmentationtablespacesvacuum

Especially analytic databases that try to optimize for queries that scan large portions of tables instead of randomly accessing specific rows. For example Redshift has a concept of a sort key that, given you VACUUM after major inserts/updates/deletes, ensures the data is sorted on disk according to the sort key. Ideally this minimizes disk seeks and optimizes large scans.

However, my understanding is databases write their data blocks to a set of database files. And these files are managed by the file system which has it's own block scheme that is subject to fragmentation. So how does the database manage this (or maybe more precisely how does something like VACUUM manage this) so that database blocks are sorted correctly on disk?

I'm guessing it's to do with the fact that most databases run on servers where not much else is running? Or the database is usually given its own disk partitions? Or the database files reserve a bunch of space ahead of time? In either of these cases there aren't too many processes writing to the same disk space as the database so there isn't much fragmentation outside of the database's control?

Best Answer

Depending on the operating system in use, and the DBMS in use, there are varying degrees of cooperation between them.

For instance, using SyBase ASE on AIX, you can tell the DBMS to allocate an entire file system as a single monolithic structure for use ONLY by the DBMS. In this instance, the database is the file system, and file-system-level fragmentation is a non-issue.

For SQL Server, and probably most other DBMS's, database files are stored on a pre-designed file system that may or may not contain other files. If the filesystem containing these database files does not contain files from any other system, the database files may still become fragmented due to growth of the files themselves.

In a filesystem that contains both databases and other files, filegrowth operations on the database files will almost certainly result in their becoming physically fragmented. As far as I am aware there are no DBMS's that integrate directly with the operating system to prevent physical file fragmentation.

Many larger production database systems run on highly complex systems including virtualization of the physical machines, and storage-area-networks, or SANs. These systems are designed to share resources across the machines using them. In the case of databases stored on a SAN, there is a very good chance the physical blocks on the SAN that contain the database files will in fact be spread across a large number of physical drives, and are unlikely to be contiguously allocated. Brent Ozar says "If you’re using shared storage like EMC, NetApp, or Dell gear that shares drives between a bunch of different servers, then all of your drive access will be random anyway. Your hard drives are shared with other servers that are also making drive requests at the same time, so the drives will always be jumping all over the place to get data."1