I've got a question regarding full data journaling on ext3 filesystems. The man page states the following:
data=journal
All data is committed into the journal prior to being written into
the main filesystem.
It seems to me that that means that a file is first saved to the journal and then copied to the filesystem.
I assumed that if I download something it should first be saved in the journal and if complete moved to FS. But after starting the download file appears in the directory (FS). What's wrong about that?
Edit: Maybe its wrong to think of "all Data" = whole size of the file? So if all data is maybe only a Block or something else than it would make sense and I couldn't see that things are first written to journal?!
Best Answer
First, you're right to suspect that “all data” doesn't mean the whole file. In fact, that layer of the filesystem operates on fixed-size file blocks, not on whole files. At that level, it's important to keep a bounded amount of data, so working on whole files (which can be arbitrary large) wouldn't work.
Second, there's a misconception in your question. The journaling behavior isn't something you can observe by looking at the directory contents with
ls
, it works at a much lower level. With normal tools, you'll always see that the file is there. (It would be catastrophic if creating a file didn't appear to, y'know, create it.) What happens under the hood is that the file can be stored in different ways. At first, the first few blocks are saved in the journal. Then, as soon as efficiently possible, the data is moved to its final location. It's still the same file in the same directory, just stored differently.The only way you can observe journaling behavior is if you go and see exactly what the kernel is writing to the disk, or if you analyse the disk content after a crash. In normal operation, the journal is an implementation detail: if you could see it in action (other than performance-wise), it would be severely broken.
For more information about filesystem journals, I recommend starting with the Wikipedia article. In ext3 terms, a
data=journal
ensures that if the system crashes, each file is in a state that it had at some point before the crash (it's not always the latest state because of buffering). The reason this doesn't happen automatically is that the kernel reorders disk writes for efficiency (it can make a big difference). This is called a “physical journal” in the Wikipedia article. The other two modes (data=ordered
anddata=writeback
) are forms of “logical journal”: they're faster, but they can lead to corrupted files. The journal limits the risk of corruption to a few files containing garbage; ext3 always uses a full journal for metadata. Without a journal for metadata, metadata can get lost, leading to major filesystem corruption. Furthermore, without a journal, recovery after a crash requires a full filesystem integrity check, whereas with a journal recovery means replaying a few journal entries.Note that even with a journal, typical unix filesystems don't guarantee global filesystem consistency, only per-file consistency at most. That is, suppose you write to file
foo
, then you write to filebar
, then the system crashes. It's possible forbar
to have the new contents butfoo
to still have the old contents. To have complete consistency, you need a transactional filesystem.