How to avoid broken pipe in commands with cat

catemacspipe

Why does this simple command fails using emacs shell (eshell)?

cat file.txt | wc 

I have a file with 10241 lines. Each line has less than 50 characters. Around 90% of the times I launch this command, it gives the wrong result, namely line count. Nonetheless, no error messages are given.

Looks like broken pipe is a very common topic, but I haven't found any reasonable explanation. Also, no one proposes any workarounds. How can I get this simple command working reliably?

Of course, I could've just run wc file.txt. But I'm looking for a more general solution in which any tool would work fine piped cat: cat file.txt | any_tool_here.

Details

I'm using CentOS 5. This issue appears when using eshell (emacs shell). I'm using GNU Emacs 24.5.2.

Experiments

Samples of results using cat file.txt | wc (expected: first column to be always 10241).

  1. 8568 25706 110571
  2. 9837 29513 126947
  3. 5395 16187 69615
  4. 9202 27608 118757
  5. 7299 21899 94199
  6. 9837 29513 126947

Sample of results using wc file.txt:

  1. 10241 30723 132156
  2. 10241 30723 132156
  3. 10241 30723 132156
  4. 10241 30723 132156
  5. 10241 30723 132156
  6. 10241 30723 132156

The cat command itself (when executed alone) is working properly. I validated it with the following command (a few times): cat file.txt > file2.txt. Then, I diff'd both files and they are identical.

Best Answer

Gathering from the information about the shell that was used (eshell), it appears that the streaming aspect of this shell is the culprit. Normally, piping means opening two ends of a pipe + fork/exec, so you get two processes that share a file descriptor to a pipe, and communication goes directly through the kernel. This way, nothing can get lost - it's guaranteed to be safe (although if it the pipe or any involved stream are buffered, you may have to wait for the first process to exit normally to flush out the last chunk of the stream).

Judging from the excerpt from eshell manual:

Eshell is not a replacement for system shells such as bash or zsh. Use Eshell when you want to move text between Emacs and external processes; if you only want to pipe output from one external process to another (and then another, and so on), use a system shell, because Emacs’s IO system is buffer oriented, not stream oriented, and is very inefficient at such tasks. If you want to write shell scripts in Eshell, don’t; either write an elisp library or use a system shell.

eshell is not doing it the normal way, but fakes the pipe using its "buffers" (emacs' representation of open files) as intermediate deposit for data, and (without further research) I'd guess that at some point, wc performs a read, and emacs responds with an empty chunk (and returning 0 from read is a signal that the stream has ended) instead of waiting for more input from the first program to fill the buffer. If that's the case, it means that eshell is not only inefficient but buggy when it comes to pipes.

Related Question