First, create a wrapper script that changes to the directory given in the first (and only) command-line argument, performs whatever setup/variable-initialisation/etc it needs, and then runs your 10 scripts in sequence with whatever args they need.
For example, if each script processes all .jpg, .png, and .gif files
in the directory:
#! /bin/bash
# example-wrapper.sh
cd "$1"
script1 *.{jpg,png,gif}
script2 *.{jpg,png,gif}
script3 *.{jpg,png,gif}
script4 *.{jpg,png,gif}
script5 *.{jpg,png,gif}
script6 *.{jpg,png,gif}
script7 *.{jpg,png,gif}
script8 *.{jpg,png,gif}
script9 *.{jpg,png,gif}
script10 *.{jpg,png,gif}
Next, use find
to pipe a list of directories into parallel
.
find /path/to/parent/ -mindepth 1 -type -d -print0 |
parallel -0 -n 1 ./example-wrapper.sh
(the -mindepth 1
option in find
excludes the top level directory, i.e. the parent directory itself)
By default, parallel will run one instance (a "job") of ./example-wrapper.sh
for each CPU core you have. Each instance will get ONE (-n 1
) directory name. As soon as a job has finished, another is started (if there are any remaining jobs to run).
This makes maximal use of available CPU power, without letting jobs compete with each other for CPU time.
You can use parallel
's -j
option to tune the number of jobs to run at once. For CPU-intensive tasks, the default of one job per system core is probably what you want.
If your jobs aren't very CPU-intensive but tend to be more I/O bound, you may want to run 2 or 3 jobs for every core you have (depending on how large your input files are, how fast your storage is, and what kind of devices make up that storage - e.g. SSDs don't suffer from seek latency so won't be slowed down by multiple processes seeking data from all over the disk. Hard disks do suffer from seek times and WILL slow down from being made to seek randomly all over the place - Linux's disk buffering/caching will help, but won't eliminate the problem).
If you want to get other work done (e.g. normal desktop usage) while these jobs are running, use -j
to tell parallel
to use one or two fewer cores than your system has (e.g. -j 6
on an 8-core system).
NOTE: Tuning parallel processes is a fine art and can take some experimenting to get the best results.
Anyway, from man parallel
:
--jobs N
,
-j N
,
--max-procs N
,
-P N
Number of jobslots. Run up to N jobs in parallel. 0 means as many as possible. Default is 100% which will run one job per CPU core.
If --semaphore
is set default is 1 thus making a mutex.
This is really basic and primitive use of parallel
. It can do a lot more. See the man page for details.
BTW, xargs
also has a -P
option for running jobs in parallel. For simple usage like this, it makes little difference whether you use xargs -P
or parallel
. But if your requirements are more complicated, use parallel
.
parallel
should be packaged for most linux distros, otherwise it's available from https://www.gnu.org/software/parallel/
First, the extglob controls what ls
sees on its command line. It does not control what ls
does with what it sees on the command line. This is important because the -R
option to ls
tells ls
to explore recursively any directories it sees on the command line. So, even if the *uploads*
directories are not given explicitly on the command line, ls
will find them when it explores their parent directories.
Second, as you know, don't parse ls. The output of ls
is not meant for use in pipelines or scripts. Trying to use it that way eventually leads to unhappiness.
Third, to get the files that you want, try:
find ./public_html ! -path '*uploads*'
To explain:
The ./public_html
tells find to start looking in the ./public_html
directory.
By itself, the option -path '*uploads*'
matches on any path that contains the pattern *uploads*
. (-path
is similar to find's -name
option but path
includes the directory names.) The preceding !
, however, indicates negation. So, the option ! -path '*uploads*'
excludes any path matching *uploads*
.
To get ls
style output while still using the features of find
, consider:
find ./public_html ! -path '*uploads*' -exec ls -dalh {} +
Best Answer
You can use
bash
extended globbing for this:From the man page:
So putting a number range inside will match files/directories. Adding the
&&
conditional will ensure that you only compress images if the match is a directory (and that you actually succeeded in entering it).Without the extended globbing, you could even just do
[1-2][0-9][0-9][0-9]/[0-1][0-9]
. This is better than trying a brace expansion as you won't end up attempting to enter directories for every single year/month, even if you have no images from then.