Are files opened by processes loaded into RAM

fileslsofmemory

Commands, for instance sed, are programs and programs are codified logic inside a file and these files are somewhere on the hard disk. However when commands are being run, a copy of their files from the hard disk is put into the RAM, where they come to life and can do stuff and are called processes.

Processes can make use of other files, read or write into them, and if they do those files are called open files. There is a command to list all open files by all running processes: lsof.

OK, so what I wonder about is if the double life of a command, one on the hard disk, the other in the RAM is also true for other kind of files, for instance those who have no logic programmed, but are simply containers for data.

My assumption is, that files opened by processes are also loaded into the RAM. I do not know if it is true, it is just an intuition.

Please, could someone make sense of it?

Best Answer

However when commands are being run, a copy of their files from the hard disk is put into the RAM,

This is wrong (in general). When a program is executed (thru execve(2)...) the process (running that program) is changing its virtual address space and the kernel is reconfiguring the MMU for that purpose. Read also about virtual memory. Notice that application programs can change their virtual address space using mmap(2) & munmap & mprotect(2), also used by the dynamic linker (see ld-linux(8)). See also madvise(2) & posix_fadvise(2) & mlock(2).

Future page faults will be processed by the kernel to load (lazily) pages from the executable file. Read also about thrashing.

The kernel maintains a large page cache. Read also about copy-on-write. See also readahead(2).

OK, so what I wonder about is if the double life of a command, one on the hard disk, the other in the RAM is also true for other kind of files, for instance those who have no logic programmed, but are simply containers for data.

For system calls like read(2) & write(2) the page cache is also used. If the data to be read is sitting in it, no disk IO will be done. If disk IO is needed, the read data would be very likely put in the page cache. So, in practice, if you run the same command twice, it could happen that no physical I/O is done to the disk on the second time (if you have an old rotating hard disk - not an SSD - you might hear that; or observe carefully your hard disk LED).

I recommend reading a book like Operating Systems : Three Easy Pieces (freely downloadable, one PDF file per chapter) which explains all this.

See also Linux Ate My RAM and run commands like xosview, top, htop or cat /proc/self/maps or cat /proc/$$/maps (see proc(5)).

PS. I am focusing on Linux, but other OSes also have virtual memory and page cache.

Related Question