Ubuntu – Counting files in a directory

bashcommand linescripts

I use the following code at the end of one of my scripts to tally up the number of files I have processed and moved into that directory.

# Report on Current Status
echo -n "Cropped Files: "
ls "${Destination}" | wc -l

My problem lies with how I handle duplicate files. As of right now, I check for the file's presence first (as my script is destructive in nature to the source files I am processing). If it senses a file of that name already processed, I alter the filename as follows.

Duplicate file: foo.pdf

Changed name: foo.x.pdf

If there is a foo.x.pdf, then I rename again to foo.xx.pdf. Repeat as necessary. I intend to go in later and evaluate each 'version' and select the best one to keep on hand. But herein lies my problem. I would like to count the number of files that do not contain .x. .xx. and so on. How do I strip these out of the ls output so wc -l can count the unique files only?

TL;DR: How do I get the count of files in a given directory that do not contain a given substring in their filename?

Best Answer

To find the number of files in a directory that do not contain .x.pdf, try:

find "${Destination}" -mindepth 1 ! -name '*.x.pdf' -printf '1' | wc -c

To find the number of files in a directory that do not contain period - one or more x - period - pdf, try:

find "${Destination}" -mindepth 1 ! -regex '.*\.x+\.pdf' -printf '1' | wc -c

The above search recursively through subdirectories. If you don't want that, add the option -maxdepth 1. For example:

find "${Destination}" -mindepth 1 -maxdepth 1 ! -regex '.*\.x+\.pdf' -printf '1' | wc -c

Note that because we use -printf '1', this method is safe even if the directory contains files whose names contain newline characters.

The Demo

bash-4.3$ # Same directory structure as in OP example
bash-4.3$ ls TESTDIR
bash-4.3$ # now run script
AAA  AAA.mkv  AAA.nfo  AAA-picture.jpg  BBB  BBB-clip.mp4  BBB.mp4  BBB.srt
bash-4.3$ ./collate_files.py ./TESTDIR
/home/xieerqi/TESTDIR/AAA/AAA-picture.jpg
/home/xieerqi/TESTDIR/AAA/AAA.mkv
/home/xieerqi/TESTDIR/AAA/AAA.nfo
/home/xieerqi/TESTDIR/BBB/BBB.srt
/home/xieerqi/TESTDIR/BBB/BBB.mp4
/home/xieerqi/TESTDIR/BBB/BBB-clip.mp4

Script itself

#!/usr/bin/env python
import re,sys,os

top_dir = os.path.realpath(sys.argv[1])

# Create list of items in directory first
# splitting names at multiple separators
dir_list = [os.path.join(top_dir,re.split("[.-]",f)[0])
            for f in os.listdir(top_dir)
]
# Creating set ensures we will have unique
# directory namings
dir_set = set(dir_list)

# Make these directories first
for dir in dir_set:
    if not os.path.exists(dir):
        os.mkdir(dir)

# now get all files only, no directories
files_list = [f for f in os.listdir(top_dir)
              if os.path.isfile(os.path.join(top_dir,f))
]

# Traverse lists of directories and files,
# check if a filename starts with directory
# that we're testing now, and if it does - move
# the file to that directory
for dir in dir_set:
    id_string = os.path.basename(dir)
    for f in files_list:
        filename = os.path.basename(f)
        if filename.startswith(id_string):
           new_path = os.path.join(dir,filename)
           print(new_path)
           #os.rename(f,new_path)

Additional notes:

The script can well be adapted to split files at other multiple separators (in the re.split() function): add inside square brackets ( meaning "[.-]") add whatever characters you want.
The moving part is performed with os.rename() function. Alternatively you could import shutil and use shutil.move() function. See https://stackoverflow.com/a/8858026/3701431

Best Answer

Related Solutions

Ubuntu – Remove Leading or Trailing Space(s) in File or Folder Names

Ubuntu – Moving like named files into self-named directories

The Demo

Script itself

Additional notes:

Related Question