Are files opened by processes loaded into RAM

fileslsofmemory

Commands, for instance sed, are programs and programs are codified logic inside a file and these files are somewhere on the hard disk. However when commands are being run, a copy of their files from the hard disk is put into the RAM, where they come to life and can do stuff and are called processes.

Processes can make use of other files, read or write into them, and if they do those files are called open files. There is a command to list all open files by all running processes: lsof.

OK, so what I wonder about is if the double life of a command, one on the hard disk, the other in the RAM is also true for other kind of files, for instance those who have no logic programmed, but are simply containers for data.

My assumption is, that files opened by processes are also loaded into the RAM. I do not know if it is true, it is just an intuition.

Please, could someone make sense of it?

Best Answer

However when commands are being run, a copy of their files from the hard disk is put into the RAM,

This is wrong (in general). When a program is executed (thru execve(2)...) the process (running that program) is changing its virtual address space and the kernel is reconfiguring the MMU for that purpose. Read also about virtual memory. Notice that application programs can change their virtual address space using mmap(2) & munmap & mprotect(2), also used by the dynamic linker (see ld-linux(8)). See also madvise(2) & posix_fadvise(2) & mlock(2).

Future page faults will be processed by the kernel to load (lazily) pages from the executable file. Read also about thrashing.

The kernel maintains a large page cache. Read also about copy-on-write. See also readahead(2).

OK, so what I wonder about is if the double life of a command, one on the hard disk, the other in the RAM is also true for other kind of files, for instance those who have no logic programmed, but are simply containers for data.

For system calls like read(2) & write(2) the page cache is also used. If the data to be read is sitting in it, no disk IO will be done. If disk IO is needed, the read data would be very likely put in the page cache. So, in practice, if you run the same command twice, it could happen that no physical I/O is done to the disk on the second time (if you have an old rotating hard disk - not an SSD - you might hear that; or observe carefully your hard disk LED).

I recommend reading a book like Operating Systems : Three Easy Pieces (freely downloadable, one PDF file per chapter) which explains all this.

See also Linux Ate My RAM and run commands like xosview, top, htop or cat /proc/self/maps or cat /proc/$$/maps (see proc(5)).

PS. I am focusing on Linux, but other OSes also have virtual memory and page cache.

Related Solutions

How to list number of open file descriptors by process for all processes on Unix

I'd do something like:

sudo lsof -FKc |
  awk '
   function process() {
     if (pid || tid) {
       print n, \
             tid ? tid " (thread of " pid ": " pname")" : pid, \
             name
       n = tid = 0
     }
   }
   {value = substr($0, 2)}
   /^p/ {
     process()
     pid = value
     next
   }
   /^K/ {
     tid = value
     next
   }
   /^c/ {
      name = value
      if (!tid)
        pname = value
      next
   }
   /^f/ {n++}
   END {process()}' | sort -rn

For number of open files, and replace /^f/ with /^f[0-9]/ for number of open file descriptors.

Linux – OOM from iterating over very large memory map

Linux allows it's memory to be over allocated, on the grounds that most programs ask for more memory than they actually need, and then worries when it finds it has actually over allocated it's memory and launches OOM killer.

sysctl vm.overcommit_memory can be helpful, setting it to 2 will cause the OS to give you an error when you do something it can't handle, rather than hoping for the best.

Having a swap partition/file big enough to hold your data may allow things to work as expected.

Alternately, this could be an issue with the files themselves being too big, so mmap2() may be a better choice.

Best Answer

Related Solutions

How to list number of open file descriptors by process for all processes on Unix

Linux – OOM from iterating over very large memory map

Related Question