It's clear to me how deleting open files works on filesystems that use inodes – unlink() just decreases the link count to zero, and when the last file handle to the file is closed, the inode will be removed.
But how does it work when using a file system that doesn't use inodes, like FAT32, with Linux?
Some experiments suggest that deleting open files is still possible (unlike on Windows, where the unlink call wouldn't succeed), but what happens when the file system is uncleanly unmounted?
How does Linux mark the files as unlinked, when the file system itself doesn't support such an operation? Is the directory entry just deleted, but retained in memory (that would guarantee deletion after unmounting in any case, but would leave the file system in an inconsistent state), or will the deletion only be marked in memory, and written at the time the last file handle is closed, avoiding possible corruption, but restoring the deleted files after an unclean unmount?
Best Answer
You are correct in your assumption that while all directory entries are deleted immediately after calling unlink(), the actual blocks that physically make up the file are only cleared on disk when nothing is using the inode anymore. (I say "directory entries" because in vfat, a file can actually have several of those, because of how vfat's long file name support is implemented.)
In this context, by inode, I mean the structure in memory that the Linux kernel uses for handling files. It is used even when the filesystem is not "inode based". In the case of vfat, the inode is simply backed by some blocks on disk.
Taking a look at the Linux kernel source code, we see that
vfat_unlink
, which implements theunlink()
system call for vfat, does roughly the following (extremely simplified for illustration):So what happens is:
fat_remove_entries
simply removes the entry for the file in its directory.clear_nlink
sets the link count for the inode to0
, which means that no file (i.e. no directory entry) points to this inode anymore.Note that at this point, neither the inode nor its physical representation are touched in any way (except for the decreased link count), so it still happily exists in memory and on disk, as if nothing happened!
(By the way, it's also interesting to note that
vfat_unlink
always sets the link count to0
instead of just decrementing it usingdrop_link
. This should tip you off that FAT filesystems do not support hard links! And is further indication that FAT itself does not know of any separate inode concept.)Now let's take a look at what happens when the inode is evicted.
evict_inode
is called when we do not want the inode in memory anymore. At its earliest, this can of course only happen when no process holds any open file descriptor to that inode anymore (but may in theory also happen at a later time). The FAT implementation forevict_inode
looks (again, simplified) like this:The magic happens exactly within the
if
-clause: if the inode's link count was 0, it means that no directory entry is actually pointing to it. So we set its size to 0 and actually truncate it down to 0 bytes, which actually deletes it from disk by clearing up the blocks it was made of.So, the corruption you are experiencing in your experiments is easily explained: Just as you suspected, the directory entry has already been removed (by
vfat_unlink
), but because the inode wasn't evicted yet, the actual blocks were still untouched, and were still marked in the FAT (an acronym for File Allocation Table) as used.fsck.vfat
however detects that there is no directory entry which points to those blocks anymore, complains, and repairs it.By the way,
CHKDSK
would not just clear those blocks by marking them as free, but create new files in the root directory pointing to the first block in each chain, with names likeFILE0001.CHK
.