Why does this simple command fails using emacs shell (eshell)?
cat file.txt | wc
I have a file with 10241 lines. Each line has less than 50 characters. Around 90% of the times I launch this command, it gives the wrong result, namely line count. Nonetheless, no error messages are given.
Looks like broken pipe is a very common topic, but I haven't found any reasonable explanation. Also, no one proposes any workarounds. How can I get this simple command working reliably?
Of course, I could've just run wc file.txt
. But I'm looking for a more general solution in which any tool would work fine piped cat: cat file.txt | any_tool_here
.
Details
I'm using CentOS 5. This issue appears when using eshell (emacs shell). I'm using GNU Emacs 24.5.2.
Experiments
Samples of results using cat file.txt | wc
(expected: first column to be always 10241).
- 8568 25706 110571
- 9837 29513 126947
- 5395 16187 69615
- 9202 27608 118757
- 7299 21899 94199
- 9837 29513 126947
Sample of results using wc file.txt
:
- 10241 30723 132156
- 10241 30723 132156
- 10241 30723 132156
- 10241 30723 132156
- 10241 30723 132156
- 10241 30723 132156
The cat command itself (when executed alone) is working properly. I validated it with the following command (a few times): cat file.txt > file2.txt
. Then, I diff'd both files and they are identical.
Best Answer
Gathering from the information about the shell that was used (
eshell
), it appears that the streaming aspect of this shell is the culprit. Normally, piping means opening two ends of a pipe + fork/exec, so you get two processes that share a file descriptor to a pipe, and communication goes directly through the kernel. This way, nothing can get lost - it's guaranteed to be safe (although if it the pipe or any involved stream are buffered, you may have to wait for the first process to exit normally to flush out the last chunk of the stream).Judging from the excerpt from eshell manual:
eshell is not doing it the normal way, but fakes the pipe using its "buffers" (emacs' representation of open files) as intermediate deposit for data, and (without further research) I'd guess that at some point,
wc
performs aread
, andemacs
responds with an empty chunk (and returning 0 fromread
is a signal that the stream has ended) instead of waiting for more input from the first program to fill the buffer. If that's the case, it means that eshell is not only inefficient but buggy when it comes to pipes.