Linux – way around broken pipe

bashlinuxpipeshellsorting

I have a directory with a large number of files.

./I_am_a_dir_with_many_subdirs/

Within a script I'd like to find all subdirs in it, to sort them and to output to a bash array. So, I do:

SubdirsArray=(`find ./I_am_a_dir_with_many_subdirs/ -maxdepth 2 -mindepth 2 -type d | sort`)

Executing the script, I get the following error messages:

    sort: write failed: standard output: Broken pipe
    sort: write error

As explained in this post: probably sort executes and closes the pipe, before find completes writing to it. Thus write() command initiated by find gets an error EPIPE "Broken pipe", OS sends find a SIGPIPE. Before the SIGPIPE reaches find, it prints the error message, then gets SIGPIPE and dies.

Questions:

  1. So, what does my SubdirsArray contain? The Subdirs, that find found, but sort left unsorted?

  2. If so, than what would be the way around this issue with broken pipes? Make find write it's results to a temporary file and then make sort read it?

    I don't understand, why "it's also nothing to be concerned about" if it happens within a non-interactive shell: why? My SubdirsArray contains something unsorted and further in the script, I assume, that its elements are sorted?!

  3. I get two error messages:

    sort: write failed: standard output: Broken pipe
    sort: write error
    

In this thread it is suggested, that sort doesn't have enough space in a temporary directory to sort all the input. But, doesn't it mean, that sort got something from find?!? I'm confused…
Anyways, I tried to use

SubdirsArray=(`find ./I_am_a_dir_with_many_subdirs/ -maxdepth 2 -mindepth 2 -type d | sort -T /home/temp_dir`)

but it didn't help.

P.S.

I'm not sure whether it's important, but I use find|sort in a multi-processor script: several processors execute the same command at once in the subshells.

Best Answer

sort: write failed: standard output: Broken pipe

The problem is not between find and sort. The sort has problem with output, which means the shell is not willing to read as long list in a variable.

You'll have to process the input with while read…, storing it in temporary file if you need it more than once. With the added advantage, that this splits on newline only, so it correctly handles filenames with spaces which the backtick approach does not.

Unfortunately you don't say how you want to use the result, I can't tell you how to exactly rewrite it.

Note, that arrays are not part of POSIX shell specification and there are shells that are noticeably faster than bash, but don't have them. That's why many people, including me, often avoid using them in scripts.

Related Question