Code first:
Let's do this with Bash's special globs and a for
loop:
#!/bin/bash
shopt -s globstar dotglob
for f in ./** ; do
if file -b -- "$f" | grep -q '^JPEG image data,' ; then
# do whatever you want with the JPEG file "$f" in here:
md5sum -- "$f"
fi
done
Explanation:
First of all, we need to make the Bash globs more useful by enabling the globstar
and dotglob
shell options. Here is their description from man bash
in the SHELL BUILTIN COMMANDS section about shopt
:
dotglob
If set, bash includes filenames beginning with a `.' in the results of
pathname expansion.
globstar
If set, the pattern ** used in a pathname expansion context will match
all files and zero or more directories and subdirectories. If the pattern
is followed by a /, only directories and subdirectories match.
Then we use this new "recursive glob" ./**
in a for
loop to iterate over all files and folders inside the current directory and all its subdirectories. Please always use absolute paths or explicit relative paths starting with a ./
or ../
in your globs, not just **
, to prevent problems with special file names like ~
.
Now we test each file (and folder) name with the file
command for its contents. The -b
option prevents it from printing the file name again before the content information string, which makes filtering more safe.
Now we know that the content information of all valid JPG/JPEG files must start with JPEG image data,
, which is what we test the output of file
for with grep
. We use the -q
option to suppress any output, as we are only interested in grep
's exit code, which indicates if the pattern matched or not.
If it matched, the code inside the if
/then
block will be executed. We can do anything we want in here. The current JPEG filename is available in the shell variable $f
. We just have to make sure to always put it in double quotes to prevent the accidental evaluation of filenames with special characters like spaces, newlines, or symbols. It is also usually best to separate it from other arguments by placing it after --
, which causes most commands to interpret it as a filename even if it's something like -v
or --help
that would otherwise be interpreted as an option.
Bonus question:
Time to blow up some code, for science! Here is the version from your question/book:
for jpeg in `echo "$(file $(find ./ )
| grep JPEG | cut -f 1 -d ':')"`
do
/path/to/command "$jpeg"
done
First of all, allow me to mention how complex they wrote it. We have 4 levels of nested subshells, using mixed command substitution syntaxes (``
and $()
), which are just necessary because of the incorrect/suboptimal usage of find
.
Here find
just lists all files and prints their names, one per line. Then the full output is passed to file
to examine each of them. But wait! One file name per line? What about file names containing newlines? Right, those will break it!
$ ls --escape ne*ne
new\nline
$ file $(find . -name 'ne*ne' )
./new: cannot open `./new' (No such file or directory)
line: cannot open `line' (No such file or directory)
Actually even simple spaces break it too, because those are treated as separators as well by file
. You can't even quote the "$(find ./ )"
here as a remedy, because that would then quote the whole multi-line output as one single filename argument.
$ ls simple*
simple spaces.jpg
$ file $(find ./ -name 'simple*')
./simple: cannot open `./simple' (No such file or directory)
spaces.jpg: cannot open `spaces.jpg' (No such file or directory)
Next step, the file
output gets scanned with grep JPEG
. Don't you think it's a bit easy to trick such a simple pattern, especially as the output of plain file
always contains the file name as well? Basically everything with "JPEG" in its file name will trigger a match, no matter what it contains.
$ echo "to be or not to be" > IAmNoJPEG.txt
$ file IAmNoJPEG.txt | grep JPEG
IAmNoJPEG.txt: ASCII text
Okay, so we have the file
output of all JPEG files (or those who pretend to be one), now they process all lines with cut
to extract the original file name from the first column, separated by a colon... Guess what, let's try this on a file with a colon in its name:
$ ls colon*
colons:evil.jpeg
$ file colon* | grep JPEG | cut -f 1 -d ':'
colons
So to conclude, the approach from your book works, but only if all files it checks do not contain any spaces, newlines, colons and probably other special characters and do not contain the string "JPEG" anywhere in their filenames. It is also kind of ugly, but as beauty lies in the eye of the beholder, I'm not going to ramble about that.
Best Answer
Try using an overlay, with a
chroot
. First, decide the path you want to chroot to, and make sure it exists, and similarly for the path you will overlay on/
(which is where modifications will go):I chose a directory in
/tmp/
as it's atmpfs
on my system (possibly unadvisable, but OK for me), so no changes should reach the disk. You can use asquashfs
and mount it somewhere, and use that as the overlay, but that has the problem of being read-only, I think.Now:
If you make the
upperdir
independent of a physical disk (perhaps by usingtmpfs
), this should protect thelowerdir
.Note the creation of a
root
folder - that's for my.bash_history
. A copy was made of the original.bash_history
, and then appended to.