Determining if a file system object is a procfs/sysfs/etc. “virtual file” which doesn’t report its length correctly

filesfilesystemsprocstat

Some kinds of pseudo-files, like many in the procfs and sysfs virtual file systems, have no file size (stat returns st_size == 0) and do not support SEEK_END in fseek. These file system objects, such as /proc/cpuinfo, behave more like FIFOs than regular random-access files. Unfortunately, they lie about their file type too: their st_mode field includes the S_IFREG bit, which implies a regular file, not the S_IFIFO bit.

This causes problems with input code that tries to be "smarter" about managing data, and there any many reports of tools which hang when they look at /proc/* files.

I'm working on one of those tools, which has a general-purpose stream input system that can get a stream from various sources, indicated by the user. I can also handle non-random-access streams like pipes and sockets which are purely "sequential," and the same techniques used for them would work on procfs/sysfs/other virtual file system objects — but I must be able to tell unambiguously when I'm looking at one.

Given a FILE pointer, file descriptor, or file pathname, how can I determine in C whether I'm looking at one of these troublesome pseudo-file objects? (Note: just checking for a path beginning with /proc is not sufficient, since a file system can be mounted arbitrarily. I need the OS to tell me if I can trust a file's st_size and SEEK_END.) Is there a reasonably portable-ish solution for modern *nixes?

(This is a programming question; feel free to migrate it to SO if required.)

Best Answer

If you stat a file descriptor and it turns out that the size is reported as 0, then either the file is empty or the size is unknown. If the file is empty, there's not much point playing games to read it more efficiently.

So, if the size is unknown, you will have to use your alternate code path, and if the file is empty, using the alternate code path shouldn't be a problem.

I'd warn against trying to diagnose file type just based on the filename, since that will open you up to a bait-and-switch attack (or bug). You shouldn't try to figure out anything until you've opened the file successfully, and then you should make sure that you use the file descriptor, directly or indirectly, rather than the name, because the name might no longer be associated with the same file.

Hope that helps.