Bash iterate on pairs of files

bashfilesshell-script

I have a directory with a bunch of files with names like a04x.txt, each with a corresponding b04y.txt file. I need to be able to run some commands on each pair of files and produce an additional file c04z.txt for each pair.

The actual numbers on the files are rather large and pretty sparse, so simply iterating over all numbers from 1 to 99 or something like that won't work.

Currently I use the following to handle the task, but seems like a common enough task that there ought to be a shorter/better way to do it:

for num in ./a*x.txt
do
  num="${num##*/a}"
  num="${num%x.txt}"

  my_command a${num}x.txt b${num}y.txt c${num}z.txt
done

Ideally I would also like to be warned when there are a${num}x.txt or b${num}y.txt files that don't have a matching file with the same number. I'd also like an easy way to be able to just pipe the sets of files to xargs or parallel so I can have it process multiple sets of files simultaneously.

Is there a better way to do this?

Best Answer

  1. One approach would be to do

    for afile in a*x.txt
    do
        bfile=${afile/a/b}; bfile=${bfile/x.txt/y.txt}
        cfile=${afile/a/c}; cfile=${cfile/x.txt/z.txt}
    
        my_command "$afile" "$bfile" "$cfile"
    done
    

    although I guess that isn’t a big improvement, and it could fail in a pathological case like a filename of afoox.txtbarx.txt.  Also, note that this is specifically a bash feature; it might not work in other POSIX-compliant shells (unlike ## and %, which are specified by POSIX).

  2. It’s a simple matter to say

        if [ -f "$bfile" ]
        then
            my_command "$afile" "$bfile" "$cfile"
        else
            echo Error
        fi
    

    to catch a file outliers (e.g., a17x.txt with no corresponding b17y.txt).

  3. If you put

    for afile               # with no list, defaults to "$@"; i.e., the script’s arguments
    do
        bfile=${afile/a/b}; bfile=${bfile/x.txt/y.txt}
        cfile=${afile/a/c}; cfile=${cfile/x.txt/z.txt}
    
        if [ -f "$bfile" ]
        then
            my_command "$afile" "$bfile" "$cfile"
        else
            echo Error
        fi
    done
    

    into a script, then you can run that script with a list of anumx.txt filenames as arguments, and it will process them.  You can then run that script through xargs or parallel.

  4. Checking for b file outliers (e.g., b42y.txt with no corresponding a42x.txt) as part of the above process is not straightforward, but it is easy to do a separate loop:

    for bfile in b*y.txt
    do
        afile=${bfile/b/a}; afile=${afile/y.txt/x.txt}
        if [ ! -f "$afile" ]
        then
            echo Error
        fi
    done
    
Related Question