Bash iterate on pairs of files

bashfilesshell-script

I have a directory with a bunch of files with names like a04x.txt, each with a corresponding b04y.txt file. I need to be able to run some commands on each pair of files and produce an additional file c04z.txt for each pair.

The actual numbers on the files are rather large and pretty sparse, so simply iterating over all numbers from 1 to 99 or something like that won't work.

Currently I use the following to handle the task, but seems like a common enough task that there ought to be a shorter/better way to do it:

for num in ./a*x.txt
do
  num="${num##*/a}"
  num="${num%x.txt}"

  my_command a${num}x.txt b${num}y.txt c${num}z.txt
done

Ideally I would also like to be warned when there are a${num}x.txt or b${num}y.txt files that don't have a matching file with the same number. I'd also like an easy way to be able to just pipe the sets of files to xargs or parallel so I can have it process multiple sets of files simultaneously.

Is there a better way to do this?

Best Answer

One approach would be to do
```
for afile in a*x.txt
do
    bfile=${afile/a/b}; bfile=${bfile/x.txt/y.txt}
    cfile=${afile/a/c}; cfile=${cfile/x.txt/z.txt}

    my_command "$afile" "$bfile" "$cfile"
done
```
although I guess that isn’t a big improvement, and it could fail in a pathological case like a filename of afoox.txtbarx.txt. Also, note that this is specifically a bash feature; it might not work in other POSIX-compliant shells (unlike ## and %, which are specified by POSIX).

It’s a simple matter to say

    if [ -f "$bfile" ]
    then
        my_command "$afile" "$bfile" "$cfile"
    else
        echo Error
    fi

to catch a file outliers (e.g., a17x.txt with no corresponding b17y.txt).

If you put

for afile               # with no list, defaults to "$@"; i.e., the script’s arguments
do
    bfile=${afile/a/b}; bfile=${bfile/x.txt/y.txt}
    cfile=${afile/a/c}; cfile=${cfile/x.txt/z.txt}

    if [ -f "$bfile" ]
    then
        my_command "$afile" "$bfile" "$cfile"
    else
        echo Error
    fi
done

into a script, then you can run that script with a list of anumx.txt filenames as arguments, and it will process them. You can then run that script through xargs or parallel.

Checking for b file outliers (e.g., b42y.txt with no corresponding a42x.txt) as part of the above process is not straightforward, but it is easy to do a separate loop:
```
for bfile in b*y.txt
do
    afile=${bfile/b/a}; afile=${afile/y.txt/x.txt}
    if [ ! -f "$afile" ]
    then
        echo Error
    fi
done
```

Related Solutions

Shell – How to iterate over two sets of iterables in a shell script

Ok, so you want to zip two iterables, or in other words you want a single loop, iterating over a bunch of strings, with an additional counter. It's quite easy to implement a counter.

n=0
for x in $commands; do
  mv -- "$x" "$n.jpg"
  n=$(($n+1))
done

Note that this only works if none of the elements that you're iterating over contains any whitespace (nor globbing characters). If you have items separated by newlines, turn off globbing and split only on newlines.

n=0
IFS='
'; set -f
for x in $commands; do
  mv -- "$x" "$n.jpg"
  n=$(($n+1))
done
set +f; unset IFS

If you only need to iterate over the data once, loop around read (see Why is while IFS= read used so often, instead of IFS=; while read..? for more explanations).

n=0
while IFS= read -r x; do
  mv -- "$x" "$n.jpg"
  n=$(($n+1))
done <<EOF
…
EOF

If you're using a shell that has arrays (bash, ksh or zsh), store the elements in an array. In zsh, either run setopt ksh_arrays to number array elements from 0, or adapt the code for array element numbering starting at 1.

commands=(
    ./01/IMG0000.jpg
    …
)
n=0
while [[ $n -lt ${#commands} ]]; do
  mv -- "${commands[$n]}" "$n.jpg"
done

Shell – Sort files by highest number in filename

With zsh:

typeset -A greatest
for f (*-*(n)) greatest[${f%-*}]=$f
cp -- $greatest /destination

*-*(n): non-hidden files whose name contains a - (*-*), sorted numerically ((n) glob qualifier).
${f%-*}: part of the filename up to the right-most - (or to the end if there's no -).
$greatest: expands to the non-empty values of the associative arrays. So here, for files that share the same root, only the file with the greatest number will be expanded.

Best Answer

Related Solutions

Shell – How to iterate over two sets of iterables in a shell script

Shell – Sort files by highest number in filename

Related Question