I have a directory containing about 7,000 music files. I used lame to recursively re-encode all files in it to a separate directory, outputting all files with the same relative path and file name. The output files have a .mp3 extension, but some of the input files had different extensions (.wma, .aac, etc).
I can see that there is a file count difference of ~100 files missing in the output directory. What I want to do is run a compare of the two directories and obtain a list of the files that exist in the source, but not in the destination. This would be simple enough except I need to ignore differences in file extension.
I've tried using rsync with dry-run turned on but I couldn't figure out a way to ignore file extensions. I've also tried diff but was unable to find an option to only check by name but ignore file extensions. I started thinking I could just do a recursive ls on both directories, remove the file extensions, and then compare the outputs, but I really have no idea on where to start with modifying the ls output using sed or awk.
Best Answer
To see a listing, here are two variants, one that recurses into subdirectories and one that doesn't. All use syntax specific to bash, ksh and zsh.
Shorter, in zsh:
The
comm
command lists the lines that are common to two files (comm -12
), that are only in the first file (comm -23
) or that are only in the second file (comm -13
). The numbers indicate what is subtracted from the output¹. The two input files must be sorted.Here, the files are in fact the output of a command. The shell evaluates the
<(…)
construct by providing a “fake” file (a FIFO or a/dev/fd/
named file descriptor) as the argument to the command.¹ So here the minus sayers are fully justified.
If you want to perform actions on the files, you'll probably want to iterate over the source files.