Bash – Script to loop through folders with numeric names

bashshell-scriptwildcards

I am working on a bash script to compress the images in my WordPress folders. The wordpress folder structure is as follows:

> wp-content/uploads/2014/01/filename.jpg
> wp-content/uploads/2014/02/filename.jpg
> wp-content/uploads/2014/03/filename.jpg
> wp-content/uploads/2014/04/filename.jpg
> 
> i.e. wp-content/uploads/YEAR/MONTH/filename.jpg

In the uploads folder I have a number of other folders (which were created when plugins were installed), so I'm trying to loop through the folders with numeric names only and then compress the images. Here is what I have so far:

DIR_UPLOADS=/home/html/wp-content/uploads/
cd ${DIR_UPLOADS}    
for d in *; do                 # First level i.e. 2014, 2013 folders.
     regx='^[0-9]+$'           # Regular Expression to check for numerics.
     if [[$d =~ $regx]]; then  # Check if folder name is numeric.
       #echo "error: Not a number" >&2; exit 1
       cd $d
       for z in *; do          # Second level i.e. the months folders 01, 02 etc.
        cd $z
        for x in *; do         # Third level the actual file.              
          echo 'Compress Image'
        done
      done
     fi
    done

I'm trying to use reg ex to detect the numeric folders, this isn't quite right, but I think I'm close.

Best Answer

You can use bash extended globbing for this:

shopt -s extglob
DIR_UPLOADS=/home/html/wp-content/uploads/
cd ${DIR_UPLOADS}

for dir in $PWD/+([0-9])/+([0-9]); do
  cd "$dir" &&
    for file in *; do
      echo 'Compress Image'
    done
done

From the man page:

+(pattern-list)
    Matches one or more occurrences of the given patterns

So putting a number range inside will match files/directories. Adding the && conditional will ensure that you only compress images if the match is a directory (and that you actually succeeded in entering it).

Without the extended globbing, you could even just do [1-2][0-9][0-9][0-9]/[0-1][0-9]. This is better than trying a brace expansion as you won't end up attempting to enter directories for every single year/month, even if you have no images from then.

Related Solutions

Bash – Executing commands consequtively on multiple folders

First, create a wrapper script that changes to the directory given in the first (and only) command-line argument, performs whatever setup/variable-initialisation/etc it needs, and then runs your 10 scripts in sequence with whatever args they need.

For example, if each script processes all .jpg, .png, and .gif files in the directory:

#! /bin/bash
# example-wrapper.sh

cd "$1"

script1 *.{jpg,png,gif}
script2 *.{jpg,png,gif}
script3 *.{jpg,png,gif}
script4 *.{jpg,png,gif}
script5 *.{jpg,png,gif}
script6 *.{jpg,png,gif}
script7 *.{jpg,png,gif}
script8 *.{jpg,png,gif}
script9 *.{jpg,png,gif}
script10 *.{jpg,png,gif}

Next, use find to pipe a list of directories into parallel.

find /path/to/parent/ -mindepth 1 -type -d -print0 | 
  parallel -0 -n 1 ./example-wrapper.sh

(the -mindepth 1 option in find excludes the top level directory, i.e. the parent directory itself)

By default, parallel will run one instance (a "job") of ./example-wrapper.sh for each CPU core you have. Each instance will get ONE (-n 1) directory name. As soon as a job has finished, another is started (if there are any remaining jobs to run).

This makes maximal use of available CPU power, without letting jobs compete with each other for CPU time.

You can use parallel's -j option to tune the number of jobs to run at once. For CPU-intensive tasks, the default of one job per system core is probably what you want.

If your jobs aren't very CPU-intensive but tend to be more I/O bound, you may want to run 2 or 3 jobs for every core you have (depending on how large your input files are, how fast your storage is, and what kind of devices make up that storage - e.g. SSDs don't suffer from seek latency so won't be slowed down by multiple processes seeking data from all over the disk. Hard disks do suffer from seek times and WILL slow down from being made to seek randomly all over the place - Linux's disk buffering/caching will help, but won't eliminate the problem).

If you want to get other work done (e.g. normal desktop usage) while these jobs are running, use -j to tell parallel to use one or two fewer cores than your system has (e.g. -j 6 on an 8-core system).

NOTE: Tuning parallel processes is a fine art and can take some experimenting to get the best results.

Anyway, from man parallel:

--jobs N, -j N, --max-procs N, -P N

Number of jobslots. Run up to N jobs in parallel. 0 means as many as possible. Default is 100% which will run one job per CPU core.

If --semaphore is set default is 1 thus making a mutex.

This is really basic and primitive use of parallel. It can do a lot more. See the man page for details.

BTW, xargs also has a -P option for running jobs in parallel. For simple usage like this, it makes little difference whether you use xargs -P or parallel. But if your requirements are more complicated, use parallel.

parallel should be packaged for most linux distros, otherwise it's available from https://www.gnu.org/software/parallel/

Bash – shopt -s extglob not working as expected

First, the extglob controls what ls sees on its command line. It does not control what ls does with what it sees on the command line. This is important because the -R option to ls tells ls to explore recursively any directories it sees on the command line. So, even if the *uploads* directories are not given explicitly on the command line, ls will find them when it explores their parent directories.

Second, as you know, don't parse ls. The output of ls is not meant for use in pipelines or scripts. Trying to use it that way eventually leads to unhappiness.

Third, to get the files that you want, try:

find ./public_html ! -path '*uploads*'

To explain:

The ./public_html tells find to start looking in the ./public_html directory.
By itself, the option -path '*uploads*' matches on any path that contains the pattern *uploads*. (-path is similar to find's -name option but path includes the directory names.) The preceding !, however, indicates negation. So, the option ! -path '*uploads*' excludes any path matching *uploads*.

To get ls style output while still using the features of find, consider:

find ./public_html ! -path '*uploads*' -exec ls -dalh {} +

Best Answer

Related Solutions

Bash – Executing commands consequtively on multiple folders

Bash – shopt -s extglob not working as expected

Related Question