MacOS – the meaning of the command lsof +L1

invisible-filesmacosterminal

lsof +L1

My understanding is that the command above shows files that have been deleted but are still open. These files have a “link count” of 0, meaning that there is no directory entry (no link, i.e. no name) that leads to them, but the file data still exists. The file data will be deleted when the file is closed. I commonly see references to this issue when people are talking about log files that the system is writing to. The file can be deleted but the process continues to write to the file and it takes up more and more memory. Other people have identified this sometimes as malicious behavior of malware that is trying to hide itself.

I am trying to understand how this can happen? When I open a file to read or write to in python, the file exists on the disk. I tried reading the relevant Xcode docs on file access but it is very dense. Are there different kinds of files – those in ram and those on disk?

These are named files so it is not memory being used by the application. For example, running this command on my mac reveals about 300 such files. the 'offending' processes are things like loginwindow, dock, systemUIS, sharingd,CalendarA, and lots of others. If this is stuff that a process is writing to RAM, can I 'attach' myself to the process with gdb and see what it's doing? Can i write these files to the disk to read them?

Best Answer

There are three things to a "file" on POSIX filesystems:

The set of data blocks - the file's actual contents.
The inode, which is a structure that holds the list of said blocks, and some metadata (size, ownership, permissions, link count, and some others).
One or more directory entries, which contain a name and an inode number (and other things)

What you see when you run ls or in file browsers are the directory entries, organized in a tree of directories and files. Each of the directory entries map the file name to an inode number. The inode number is used to locate the inode, which is used to locate the actual blocks (and check permissions, etc.)

When you create a file, an inode is created with an initial link count of one, and a directory entry is set up with the name you specified, pointing to that inode.

If you create a hard link, a second directory entry is made with the name you chose, but pointing to the same inode - both directory entries refer to the same inode (i.e. you now have two names that refer to the same file). The link count of the inode is incremented for each new hard link.

When a process opens a file, using a file name, the kernel does the directory entry lookup, finds the inode, and returns a file descriptor that "refers to" the inode, not the directory entry. The directory entry is irrelevant once the file has been opened - it is just a convenient way to locate the right inode.

When you delete a file (e.g. using rm), you're not actually deleting the file, you're deleting the directory entry. The kernel decrements the inode's link count, but doesn't delete the inode (and reclaim space) unless:

that directory entry was the last one pointing to it (i.e. link count is down to zero - this is what lsof +L1 lists: open files that are completely unlinked)
there are no remaining open file descriptors that refer to it

So processes can continue operating on that file, even if there is no way to get back to it from browsing the filesystem. And you can get apparent inconsistencies from the output of df and du for instance:

df interrogates the filesystem to see how many free blocks it has. The data blocks from the "hidden" files with no more directory entries are not free (there are still processes that can read/write them), so they still occupy space and will continue occupying that space until the last file descriptor that refers to them is closed
du lists directory entries and sums up sizes. It can't see these unlinked files, and will thus return less space used than the filesystem will.

If the files are on traditional disks, they continue to occupy disk space just like normal, still linked files. IO happens as normal. It doesn't have more main memory requirements/start eating RAM.

If the unlinked but open files are in a filesystem backed by RAM, then they continue to occupy memory, as they did before being unlinked. (In both cases the files can still grow/shrink too.)

The space will be reclaimed only when the last open file descriptor is closed. (Note that still open file descriptors get closed when a process exits or is otherwise terminated.)

If you attach a debugger to a program that is using unlinked files, you won't see anything particularly interesting. File IO calls will look exactly the same as for normal, still linked, files. Nothing special going on there. By inspecting what is read/written you might get some ideas about what the process is using these files for, but that's about it.

As for accessing these files, I'm afraid I don't know OS X enough to tell if there's an easy way. The fdesc pseudo-filesystem looks like it could be useful, but apparently only gives you access to the current process's files.

A simple example of how a process can do this, in perl. (Can be done with just about any language, including shell scripts.)

Setup and helper function:

#! /usr/bin/perl
use strict;
use warnings;
use Fcntl qw(SEEK_SET); # for rewinding

my $fh;                 # file descriptor/handle
my $test_file = "./test_file";

sub status {            # checks if the file is "visible"
  my @st = stat($test_file);
  if (@st) {
    print "$test_file: file exists\n";
  } else {
    print "$test_file: error: $!\n";
  }     
}

The main part:

# open file in read/write mode, creating it if it doesn't exist
# (overwriting it if it does)
if (!open($fh, '+>', $test_file)) {
  die "Failed to open $test_file: $!";
}
print $fh "Some data before unlink.\n";
status();
unlink($test_file);
status();
print $fh "Some data after unlink.\n";

# Rewind
seek($fh, 0, SEEK_SET);
# Print file contents
foreach my $line (<$fh>) {
  print "read: $line";
}
# Close
close($fh);

Expected output:

 $ perl test.pl
./test_file: file exists
./test_file: error: No such file or directory
read: Some data before unlink.
read: Some data after unlink.

You can move the unlink around a bit (before or after the prints), won't change anything. There's nothing special about the file handle after the unlink, can be used as any other file handle (as long as it is kept open).

Best Answer

Related Solutions

MacOS – How to retrieve files cached in the RAM

MacOS – What are these giant files in ~/Library/Autosave Information/

Related Question