Bash – Multiple Processes Redirecting to the Same File

bashconcurrencyio-redirection

This is not a "how to append and not overwrite" question. I'm not looking for a file that combines the output of two commands. It's just a mistake I made and I would like to understand why the system did what it did

I use a command (on a remote ssh command line) that takes a long time to complete and outputs data (line by line every few seconds) to stdout so I redirect it to a file:

command > file.out &

Sometimes the remote session disconnects but the command keeps on running in the background. I didn't know this so I run the same command again, before the first one had finished:

command > file.out &

When both processes have finished I would expect to have (after reading some answers at this site) a single file with the lines from both commands messed up, but the output file only has the output from one of the 2 executions.

Why doesn't the file have both outputs intertwined (as warned in the comments here)? Which one of the 2 outputs does the final file belong to?

EDIT:

Removed one of the questions (why is the output file not locked for writing?) as it's explained here

Best Answer

When you open a file for writing using the > redirection, the file is truncated, i.e. it is completely emptied. It is however not deleted and recreated.

If one command starts by truncating the file and then writes something to it, and if another command then does the same, the first command's position within the file will not change. This means that you have two commands writing to the same file at two independent positions, one possibly overwriting the output of the other, depending on the order of writing and the amount of data being written.

So, yes, the data in the file may well be an intertwined mess of the output from both programs, but it will depend on the order of writes into the file, as well as the amount of data written and the timing of the truncations of the file.

Here's an example of intertwining the data from two commands:

#!/bin/sh

( { echo hello; sleep 2; echo world; } | cat >file ) &
sleep 1
echo 123 >file &

wait

This is what happens in this script:

  1. The first command opens the file for writing and truncates it. It writes hello\n to it.
  2. After one second, the second command truncates the file and writes 123\n to it. At this point, the first command's file pointer is still pointing into the file at some offset.
  3. The first command continues writing world\n to the file.

The result is a file with a stretch of nul characters in the middle:

$ hexdump -C file
00000000  31 32 33 0a 00 00 77 6f  72 6c 64 0a              |123...world.|
0000000c

The nuls (00 in the output above) comes from the fact that the first command's file pointer wasn't reset by the second command's truncating of the file, so there was a "hole" created. The second command only wrote 123\n but would have overwritten the nuls if it had written more data:

$ hexdump -C file
00000000  31 32 33 35 36 37 77 6f  72 6c 64 0a              |123567world.|
0000000c

Here I've made the second command echo 1234567890, but only the 1234567 is left in the file. This is due to the first command continuing to write world\n at the point where its file pointer was after the second command had finished writing.

Related Question