What characterizes a file in Linux/Unix

fifofilesinodepipesocket

What characterizes a file in Linux/Unix?

A file can have many types: regular file, directory, symlink, device file, socket, pipe, fifo, and more that I miss. For example, a symlink:

$ sudo file /proc/22277/fd/23
/proc/22277/fd/23: broken symbolic link to socket:[7540288]

a socket:

$ sudo ls -l /run/user/1001/systemd/notify
srwxrwxr-x 1 testme testme 0 Feb  6 16:41 /run/user/1001/systemd/notify
  1. Is a file characterized as something with an inode (an inode in some filesystem, either in memory or in secondary storage device?)? Do files of all the file types have inodes? (I guess yes to both questions.)

  2. Linux's Internet domain socket, transport protocols (TCP/UDP)'s socket and port seems to say something with an open file description is a file. Does something with an open file description necessarily have an inode?

    open file description is a much better terminology than file, you can't define "file". Network socket and Unix domain socket are all open file description. UDS might or might associate something on the disk(there's a lot of condition can affect this). NS never associate anything on disk.

Thanks.

Best Answer

TL;DR

  • a file is an object on which you can perform some or all of basic operations - open,read,write,close - and has metadata stored in an inode.
  • file descriptors are references to those objects
  • open file description (yes, open part is important) is how file (represented by at least one file descriptor) is open

File As Abstraction

Let's consult POSIX definitions 2017, section 3.164 as to how File is defined:

An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.

So a file is anything we can read from, write to, or both, which also has metadata. Everyone - go home, case closed !

Well, not so fast. Such definition opens up whole lot of room for related concepts, and there's obviously differences between regular files and say pipes. "Everything is a file" is itself a concept and design pattern rather than a literal statement. Based on that pattern such filetypes as directories, pipes, device files, in-memory files, sockets - all of that can be manipulated via set of system calls such as open(), openat(), write(), and in case of sockets recv() and send(), in a consistent manner; take for example USB as analogy - you have so many different devices but they all connect to exactly the same USB port (nevermind there's actually multiple types of USB port types from A to C, but you get the idea).

Of course, there has to be a certain interface or reference to actual data in a consistent manner for that to work, and that's File Descriptor:

A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The value of a newly-created file descriptor is from zero to {OPEN_MAX}-1.

As such, we can write() to STDOUT via file descriptor 1 in the same fashion as we would write to a regular file /home/user/foobar.txt. When you open() a file, you get file descriptor and you can use same write() function to write to that file. That's the whole point that original Unix creators tried to address - minimalist and consistent behavior. When you do command > /home/user/foobar.txt the shell will make a copy of file descriptor that refers to foobar.txt and pass it as command's file descriptor 1 ( echo's STDOUT ), or to be more precise it will do dup2(3,1) and then execve() the command. But regardless of that, command will still use the same write syscall into file descriptor 1 as if nothing happened.

Of course, in terms of what most users think is a file, they think of a regular file on disk filesystem. This is more consistent with Regular File definition, section 3.323:

A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.

By contrast, we have Sockets:

A file of a particular type that is used as a communications endpoint for process-to-process communication as described in the System Interfaces volume of POSIX.1-2017.

Regardless of the type, the actions we can take over different filetypes are exactly the same conceptually - open, read,write, close.


All Files Have Inodes

What you should have noticed in the file definition is that file has "certain attributes", which are stored in inodes. In fact on Linux specifically, we can refer to inode(7) manual first line:

Each file has an inode containing metadata about the file. An application can retrieve this metadata using stat(2) (or related calls)

Boom. Clear and direct. We're mostly familiar with inodes as bridge between blocks of data on disk and filenames stored in directories (because that's what directories are - lists of filenames and corresponding inodes). Even in virtual filesystems such as pipefs and sockfs in kernel, we can find inodes. Take for instance this code snippet:

static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
{
    return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
            dentry->d_inode->i_ino);
}

Open File Description

Now that you're thoroughly confused, Linux/Unix introduces something known as Open File Description, and to make explanation simple - it's another abstraction. In words of Stephane Chazelas,

It's more about the the record of how the file was opened more than the file itself.

And it's consistent with POSIX definition:

A record of how a process or group of processes is accessing a file. Each file descriptor refers to exactly one open file description, but an open file description can be referred to by more than one file descriptor. The file offset, file status, and file access modes are attributes of an open file description.

Now if we also look at Understanding the Linux Kernel book, the author states

Linux implements BSD sockets as files that belong to the sockfs special filesystem...More precisely, for every new BSD socket, the kernel creates a new inode in the sockfs special filesystem.

Remembering that sockets are also referenced by file descriptors and therefore there will be open file description in kernel related to sockets, we can conclude sockets are files alright.

to be continued . . .maybe