Linux – Are file edits in Linux directly saved into disk

diskfilesystemslinux

I used to think that file changes are saved directly into the disk, that is, as soon as I close the file and decide to click/select save. However, in a recent conversation, a friend of mine told me that is not usually true; the OS (specifically we were talking about Linux systems) keeps the changes in memory and it has a daemon that actually writes the content from memory to the disk.

He even gave the example of external flash drives: these are mounted into the system (copied into memory) and sometimes data loss happens because the daemon did not yet save the contents into the flash memory; that is why we unmount flash drives.

I have no knowledge about operating systems functioning, and so I have absolutely no idea whether this is true and in which circumstances. My main question is: does this happen like described in Linux/Unix systems (and maybe other OSes)? For instance, does this mean that if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost? Perhaps it depends on the disk type — traditional hard drives vs. solid-state disks?

The question refers specifically to filesystems that have a disk to store the information, even though any clarification or comparison is well received.

Best Answer

if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?

They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.


An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.

The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp command returns faster, so you can carry on working, possibly even editing the files that were just copied.


Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync() system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.

I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.

In addition to fsync(), there are also the sync() and syncfs() system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync can be used to call those.

Then there's also the O_DIRECT flag to open(), which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it. (O_DIRECT isn't without its issues, the comments about it in the man page are somewhat amusing.)


What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.

How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4, if you set the mount flag data=journal, then all writes – even data writes – go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.


In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)

Related Question