Ubuntu – Counting files in a directory

bashcommand linescripts

I use the following code at the end of one of my scripts to tally up the number of files I have processed and moved into that directory.

# Report on Current Status
echo -n "Cropped Files: "
ls "${Destination}" | wc -l

My problem lies with how I handle duplicate files. As of right now, I check for the file's presence first (as my script is destructive in nature to the source files I am processing). If it senses a file of that name already processed, I alter the filename as follows.

Duplicate file: foo.pdf

Changed name: foo.x.pdf

If there is a foo.x.pdf, then I rename again to foo.xx.pdf. Repeat as necessary. I intend to go in later and evaluate each 'version' and select the best one to keep on hand. But herein lies my problem. I would like to count the number of files that do not contain .x. .xx. and so on. How do I strip these out of the ls output so wc -l can count the unique files only?

TL;DR: How do I get the count of files in a given directory that do not contain a given substring in their filename?

Best Answer

To find the number of files in a directory that do not contain .x.pdf, try:

find "${Destination}" -mindepth 1 ! -name '*.x.pdf' -printf '1' | wc -c

To find the number of files in a directory that do not contain period - one or more x - period - pdf, try:

find "${Destination}" -mindepth 1 ! -regex '.*\.x+\.pdf' -printf '1' | wc -c

The above search recursively through subdirectories. If you don't want that, add the option -maxdepth 1. For example:

find "${Destination}" -mindepth 1 -maxdepth 1 ! -regex '.*\.x+\.pdf' -printf '1' | wc -c

Note that because we use -printf '1', this method is safe even if the directory contains files whose names contain newline characters.

Related Question