Bash Scripting – Handling Non-Deterministic Output from Concurrent Processes

bashprocess-substitutiontee

On bash v4.1.2(2), the following simple statement, chosen merely as a minimal example demonstrating the problem, gives seemingly random output:

$ for n in {0..1000}; do echo "$n"; done | 
  tee >(head -n2) >(sort -grk1,1 | head -n3) >/dev/null

whereas the following gives consistent output:

$ seq 0 10000 | tee >(head -n2) >(sort -grk1,1 | head -n3) >/dev/null

Specifically, for the first statement, the sort command chooses apparently random consecutive triplets (e.g., 226,225,224; 52,51,50; 174,173,172; etc.). To get a sense of the heterogeneity of the output, one can run the problematic command many times, and then list the number of distinct possibilities:

$ seq -w 0 2000 | while read x; do for n in {0..1000}; do echo "$n"; done | 
  tee >(head -n2) >(sort -grk1,1 | head -n3) >/dev/null | cat > "file_${x}"; done

Counting the occurrences of the various outputs:

$ for f in file_*; do sort -g "$f" | tail -n3 | paste -sd, ; done  | 
  sort | uniq -c | sort -gk1,1 -k2,2
   1 7,8,9
   1 17,18,19
   1 40,41,42
   1 43,44,45
   1 47,48,49
   1 50,51,52
   1 54,55,56
   1 58,59,60
   1 59,60,61
   1 66,67,68
   1 71,72,73
   1 78,79,80
   1 103,104,105
   1 104,105,106
   1 106,107,108
   1 110,111,112
   1 111,112,113
   1 121,122,123
   1 125,126,127
   1 129,130,131
   1 134,135,136
   1 136,137,138
   1 142,143,144
   1 143,144,145
   1 148,149,150
   1 150,151,152
   1 156,157,158
   1 157,158,159
   1 165,166,167
   1 171,172,173
   1 173,174,175
   1 174,175,176
   1 177,178,179
   1 179,180,181
   1 181,182,183
   1 183,184,185
   1 185,186,187
   1 186,187,188
   1 191,192,193
   1 194,195,196
   1 198,199,200
   1 200,201,202
   1 206,207,208
   1 208,209,210
   1 209,210,211
   1 210,211,212
   1 216,217,218
   1 217,218,219
   1 233,234,235
   1 236,237,238
   1 237,238,239
   1 238,239,240
   1 242,243,244
   1 245,246,247
   1 246,247,248
   1 254,255,256
   1 256,257,258
   1 267,268,269
   1 270,271,272
   1 273,274,275
   1 277,278,279
   1 279,280,281
   1 287,288,289
   1 288,289,290
   1 305,306,307
   1 306,307,308
   1 307,308,309
   1 326,327,328
   1 337,338,339
   1 339,340,341
   1 340,341,342
   1 351,352,353
   1 357,358,359
   1 359,360,361
   1 365,366,367
   1 368,369,370
   1 370,371,372
   1 376,377,378
   1 377,378,379
   1 383,384,385
   1 386,387,388
   1 388,389,390
   1 401,402,403
   1 408,409,410
   1 409,410,411
   1 415,416,417
   1 419,420,421
   1 424,425,426
   1 426,427,428
   1 432,433,434
   1 454,455,456
   1 462,463,464
   1 466,467,468
   1 475,476,477
   1 482,483,484
   1 487,488,489
   1 504,505,506
   1 508,509,510
   1 511,512,513
   1 532,533,534
   1 538,539,540
   1 544,545,546
   1 548,549,550
   1 558,559,560
   1 603,604,605
   1 604,605,606
   1 608,609,610
   1 659,660,661
   1 660,661,662
   1 663,664,665
   1 668,669,670
   1 692,693,694
   1 699,700,701
   1 717,718,719
   1 738,739,740
   1 740,741,742
   1 750,751,752
   1 771,772,773
   1 784,785,786
   1 796,797,798
   1 799,800,801
   1 806,807,808
   1 814,815,816
   1 832,833,834
   1 848,849,850
   1 858,859,860
   1 869,870,871
   1 922,923,924
   1 952,953,954
   1 961,962,963
   1 985,986,987
   2 64,65,66
   2 127,128,129
   2 141,142,143
   2 169,170,171
   2 170,171,172
   2 172,173,174
   2 187,188,189
   2 221,222,223
   2 234,235,236
   2 252,253,254
   2 292,293,294
   2 350,351,352
   2 364,365,366
   2 375,376,377
   2 622,623,624
   2 666,667,668
   3 70,71,72
   3 102,103,104
   3 137,138,139
   3 155,156,157
1826 998,999,1000

shows that the result is correct ~91% of the time. Omitting the >(head -n2) process substitution from the tee statement results in the output being correct 100% of the time. I don't see why a race condition would be relevant in explaining the problem, since that should only affect the relative ordering of the output of each of the process substitutions in thetee statement (i.e., >(head -n2) may complete first or >(sort -grk1,1 | head -n3) may do so, but this should only affect the output order, not the result itself; it would even be understandable if the output of the two commands were randomly interleaved). Since tee should distribute identical copies of the stdout of the loop to the stdin of each >() and since both process substitutions are run in separate sub-shells (https://unix.stackexchange.com/a/331199/14960), neither one should affect the other, yet they clearly interact. How can the interaction be explained? Also, how can the output of a for/while loop in bash be distributed to multiple, independent processes by tee?

Best Answer

head -n2 will quit after reading two lines. Then tee will die (of a SIGPIPE) the next time it writes to the pipe to head, then sort will see eof as tee at the other end of its own pipe is also gone and sort on the lines it has received so far.

The reason why you're seeing it with the loop and not with seq is that the loop does several write()s on the pipe to tee, and depending on timing, that will likely cause tee to do several short reads. While seq will write the whole output in one go so tee will just do one read(). If you do a seq 1000000, you'll probably see random behaviour as well.

To work around the problem, you'd need a version of head that keeps reading after it has output the first 2 lines. For instance, you could use sed '3,$d' instead of head -n2 or sed 2q.

Or use:

... | (
 trap '' PIPE
 exec tee >(trap - PIPE; exec head -n2) >(trap - PIPE; sort -rn | head -n2)
) > /dev/null

for tee (only) to ignore the SIGPIPE, but with some tee implementations, you'd see some error messages because of the failing write() to the pipe.

tee: /proc/self/fd/13: I/O error

Note that while the sorted output is likely to come after the non-sorted one, there's no guarantee. More generally, you can't really guarantee the order of output of programs that run concurrently unless there's something that coordinates them.