Ubuntu – Moving like named files into self-named directories

bashcommand linedirectoryfilesscripts

I have several thousand files in one directory that I would like to collate in directories like so:

From this:

└── Files
    ├── AAA.mkv
    ├── AAA.nfo
    ├── AAA-picture.jpg
    ├── BBB.mp4
    ├── BBB.srt
    ├── BBB-clip.mp4
    ├── CCC.avi
    ├── CCC.srt
    ├── CCC-clip.mov
    └── CCC.nfo

To this:

└── Files
    ├── AAA
    │   ├── AAA.mkv
    │   ├── AAA.nfo
    │   └── AAA-picture.jpg
    ├── BBB
    │   ├── BBB.mp4
    │   ├── BBB.srt
    │   └── BBB-clip.mp4
    └── CCC
         ├── CCC.avi
         ├── CCC.srt
         ├── CCC-clip.mov
         └── CCC.nfo

The file names vary in length and number of words, sometimes separated by spaces and possibly a few with hyphens (in addition to the ones ending '-short'. They are primarily video files with a variety of formats/containers: mov/mpg/mkv/mp4/avi/ogg. Some are subtitled. Some have files with associated metadata (.nfo or -clip)

Edit: The primary files are videos (this is where I would like to draw the directory name). The associated files represent metadata. Some different in naming by only the extension. There are a half-dozen other variations on the base filename like -clip.mp4 -clip.mov or -picture.jpg I figured if something were suggested with those few then I could (hopefully) work at figuring out the rest. In summary, AAA.mkv moves into a directory called AAA. Then all metadata files that begin with AAA join it (i.e., in this example: AAA-picture.jpg and AAA.nfo). So the basename is in fact a substring in the case of the AAA-picture.jpg file. I would say it is probably relatively safe to simply use the hyphen as the delimiting factor… though '-clip' or '-picture' in its entirety would be safer.

How can I do this without getting carpal tunnel syndrome?
I looked at this but it was sufficiently different that my weak scripting abilities fizzled.

Thank you.

Best Answer

While your question is tagged with bash, this would be somewhat troublesome ( in my humble opinion ) to use bash for such task. I'd suggest using python because it has a lot of good functions for complex tasks and this answer provides a solution using that language.

Essentially what occurs here is that we use regex to split filenames at multiple delimiters, get only first part and use unique set of those first parts as basenames for new directories.

We then traverse the top directory again , and sort the files in their appropriate places.

The script doesn't do anything spectacular, and actually in algorithm analysis this wouldn't do too well, because of the nested for loops, but for "quick and dirty, yet workable" solution it's alright. If you are interested what each line does, there's plenty of comments added to explain the functionality

Note, the demo only shows printing of the new filenames for testing purpose only. Uncomment the os.rename() part to actually move the file.

The Demo

bash-4.3$ # Same directory structure as in OP example
bash-4.3$ ls TESTDIR
bash-4.3$ # now run script
AAA  AAA.mkv  AAA.nfo  AAA-picture.jpg  BBB  BBB-clip.mp4  BBB.mp4  BBB.srt
bash-4.3$ ./collate_files.py ./TESTDIR
/home/xieerqi/TESTDIR/AAA/AAA-picture.jpg
/home/xieerqi/TESTDIR/AAA/AAA.mkv
/home/xieerqi/TESTDIR/AAA/AAA.nfo
/home/xieerqi/TESTDIR/BBB/BBB.srt
/home/xieerqi/TESTDIR/BBB/BBB.mp4
/home/xieerqi/TESTDIR/BBB/BBB-clip.mp4

Script itself

#!/usr/bin/env python
import re,sys,os

top_dir = os.path.realpath(sys.argv[1])

# Create list of items in directory first
# splitting names at multiple separators
dir_list = [os.path.join(top_dir,re.split("[.-]",f)[0])
            for f in os.listdir(top_dir)
]
# Creating set ensures we will have unique
# directory namings
dir_set = set(dir_list)

# Make these directories first
for dir in dir_set:
    if not os.path.exists(dir):
        os.mkdir(dir)

# now get all files only, no directories
files_list = [f for f in os.listdir(top_dir)
              if os.path.isfile(os.path.join(top_dir,f))
]

# Traverse lists of directories and files,
# check if a filename starts with directory
# that we're testing now, and if it does - move
# the file to that directory
for dir in dir_set:
    id_string = os.path.basename(dir)
    for f in files_list:
        filename = os.path.basename(f)
        if filename.startswith(id_string):
           new_path = os.path.join(dir,filename)
           print(new_path)
           #os.rename(f,new_path)

Additional notes:

  • The script can well be adapted to split files at other multiple separators (in the re.split() function): add inside square brackets ( meaning "[.-]") add whatever characters you want.
  • The moving part is performed with os.rename() function. Alternatively you could import shutil and use shutil.move() function. See https://stackoverflow.com/a/8858026/3701431
Related Question