Linux – (How) does deleting open files on Linux and a FAT file system work

fat32filesystemslinux

It's clear to me how deleting open files works on filesystems that use inodes – unlink() just decreases the link count to zero, and when the last file handle to the file is closed, the inode will be removed.

But how does it work when using a file system that doesn't use inodes, like FAT32, with Linux?

Some experiments suggest that deleting open files is still possible (unlike on Windows, where the unlink call wouldn't succeed), but what happens when the file system is uncleanly unmounted?

How does Linux mark the files as unlinked, when the file system itself doesn't support such an operation? Is the directory entry just deleted, but retained in memory (that would guarantee deletion after unmounting in any case, but would leave the file system in an inconsistent state), or will the deletion only be marked in memory, and written at the time the last file handle is closed, avoiding possible corruption, but restoring the deleted files after an unclean unmount?

Best Answer

You are correct in your assumption that while all directory entries are deleted immediately after calling unlink(), the actual blocks that physically make up the file are only cleared on disk when nothing is using the inode anymore. (I say "directory entries" because in vfat, a file can actually have several of those, because of how vfat's long file name support is implemented.)

In this context, by inode, I mean the structure in memory that the Linux kernel uses for handling files. It is used even when the filesystem is not "inode based". In the case of vfat, the inode is simply backed by some blocks on disk.

Taking a look at the Linux kernel source code, we see that vfat_unlink, which implements the unlink() system call for vfat, does roughly the following (extremely simplified for illustration):

static int vfat_unlink(struct inode *dir, struct dentry *dentry)
{
        fat_remove_entries(dir, &sinfo);
        clear_nlink(inode);
}

So what happens is:

  1. fat_remove_entries simply removes the entry for the file in its directory.
  2. clear_nlink sets the link count for the inode to 0, which means that no file (i.e. no directory entry) points to this inode anymore.

Note that at this point, neither the inode nor its physical representation are touched in any way (except for the decreased link count), so it still happily exists in memory and on disk, as if nothing happened!

(By the way, it's also interesting to note that vfat_unlink always sets the link count to 0 instead of just decrementing it using drop_link. This should tip you off that FAT filesystems do not support hard links! And is further indication that FAT itself does not know of any separate inode concept.)

Now let's take a look at what happens when the inode is evicted. evict_inode is called when we do not want the inode in memory anymore. At its earliest, this can of course only happen when no process holds any open file descriptor to that inode anymore (but may in theory also happen at a later time). The FAT implementation for evict_inode looks (again, simplified) like this:

static void fat_evict_inode(struct inode *inode)
{
        truncate_inode_pages(&inode->i_data, 0);
        if (!inode->i_nlink) {
                inode->i_size = 0;
                fat_truncate_blocks(inode, 0);
        }
        invalidate_inode_buffers(inode);
        clear_inode(inode);
}

The magic happens exactly within the if-clause: if the inode's link count was 0, it means that no directory entry is actually pointing to it. So we set its size to 0 and actually truncate it down to 0 bytes, which actually deletes it from disk by clearing up the blocks it was made of.

So, the corruption you are experiencing in your experiments is easily explained: Just as you suspected, the directory entry has already been removed (by vfat_unlink), but because the inode wasn't evicted yet, the actual blocks were still untouched, and were still marked in the FAT (an acronym for File Allocation Table) as used. fsck.vfat however detects that there is no directory entry which points to those blocks anymore, complains, and repairs it.

By the way, CHKDSK would not just clear those blocks by marking them as free, but create new files in the root directory pointing to the first block in each chain, with names like FILE0001.CHK.

Related Question