Linux fstat – Does fstat Require Disk Access on Linux/ext4?

cfilesfilesystemslinuxlinux-kernel

Linux Kernel 5.3

Consider the fstat syscall defined as int fstat(int fd, struct stat *statbuf);. Is disk access required for the fstat syscall on ext4?

I did some research related to it and find out some info. The in-kernel entry point to the system call is the function vfs_statx_fd. Here is how its implementation looks like:

int vfs_statx_fd(unsigned int fd, struct kstat *stat,
         u32 request_mask, unsigned int query_flags)
{
    struct fd f;
    int error = -EBADF;

    if (query_flags & ~KSTAT_QUERY_FLAGS)
        return -EINVAL;

    f = fdget_raw(fd);
    if (f.file) {
        error = vfs_getattr(&f.file->f_path, stat,
                    request_mask, query_flags);
        fdput(f);
    }
    return error;
}

So what we have here is that the unsigned int fd which is actual file descriptor that a user passed to the system call is used to find a pointer to the struct file. The crucial part of its definition is

struct file {
    //...
    struct path     f_path;
    struct inode        *f_inode;   /* cached value */
    //...
}

So we basically have that struct file represents an opened file and the struct contains references to dentry and inode

Is it true that in case we have an opened file descriptor we can get all the stats just from memory avoiding costly disk access?

Update: I tried to flush Kernel caches with free && sync && echo 3 > /proc/sys/vm/drop_caches && free right before invoking the syscall and it did not affect the timing of stat syscall. So I tend to think that no disk access is required.

Best Answer

On an Ext4 file system, the function graph starting from vfs_statx_fd is

 0)               |  vfs_statx_fd() {
 0)               |    __fdget_raw() {
 0)   0.225 us    |      __fget_light();
 0)   0.775 us    |    }
 0)               |    vfs_getattr() {
 0)               |      security_inode_getattr() {
 0)               |        selinux_inode_getattr() {
 0)               |          __inode_security_revalidate() {
 0)               |            _cond_resched() {
 0)   0.216 us    |              rcu_all_qs();
 0)   0.575 us    |            }
 0)   0.945 us    |          }
 0)               |          inode_has_perm() {
 0)   0.356 us    |            avc_has_perm();
 0)   0.709 us    |          }
 0)   2.223 us    |        }
 0)   2.808 us    |      }
 0)               |      vfs_getattr_nosec() {
 0)               |        ext4_file_getattr() {
 0)               |          ext4_getattr() {
 0)   0.203 us    |            generic_fillattr();
 0)   0.600 us    |          }
 0)   1.040 us    |        }
 0)   1.502 us    |      }
 0)   4.854 us    |    }
 0)   6.913 us    |  }

Looking at the implementations of all these functions shows that there’s no provision for disk I/O. As you surmise, the data comes from the cached inode.

See also the fstat(2) manpage which mentions that:

Note: for performance and simplicity reasons, different fields in the stat structure may contain state information from different moments during the execution of the system call. For example, if st_mode or st_uid is changed by another process by calling chmod(2) or chown(2), stat() might return the old st_mode together with the new st_uid, or the old st_uid together with the new st_mode.

(although this has more to do with locking than caching).

With some other file systems, AT_STATX_FORCE_SYNC can be included in the query flags to force a remote sync; this is supported on Ceph, FUSE, NFS, and VirtualBox guest shared folders.

Related Question