Linux – Why is writing to a existing file faster than write a new empty file

filesfilesystemslinuxperformance

I use MappedByteBuffer to write file in linux.

File file = new File("testFile");
RandomAccessFile raf = new RandomAccessFile(file, "rw");
FileChannel fc = raf.getChannel();
MappedByteBuffer mbf = fc.map(FileChannel.MapMode.READ_WRITE, 0, file.length());
mbf.put(buffer);

If testFile has write 500MB, and write the 500MB data twice, it takes 1s.but when I rm testFile, write the 500MB data, it takes 4s.

Why override a file is faster than write a new file? How can I write a new file as faster as override a file?

Best Answer

Whether overwriting or creating a new file is faster depends on the filesystem type. Many filesystems overwrite file data in place; then overwriting is faster because it only requires writing the data, whereas creating a new file requires first allocating space and then writing the data in the newly allocated space. I wouldn't expect a large difference though. Some filesystems don't overwrite an existing block (to allow a write to be undone), and then overwriting an existing file is done by writing the new data followed by deleting the old data. I wouldn't expect a large difference in either case though.

The underlying layers can have similar effects to make one operation more costly than the other. For example, overwriting on a system that keeps snapshots keeps the old data around so that the snapshot can be restored. Flash media can only be erased in bulk so new data is written to free sectors but overwriting some data eventually does lead to it being freed which takes time.

By far the thing with the biggest effect on read and write timing is buffering and caching. Make sure that you're doing your benchmarks in a known cache configuration (you should probably flush the disk cache before starting each benchmarked operation) and ends with buffers all written (finish by calling sync) unless you want to measure warm-cache/buffers timings. For example, doing two consecutive writes where the first write only writes to memory buffers won't cost much more than doing a single write.

In any case, if it takes 4s to do the operation you want then it takes 4s. There's no magical way to make it 4 times faster.

Related Question