I am trying to write a bash shell script (which is called by an Automator Action) to rename TV show DVD rips that I have named badly over the years. I want to remove part of the text in the filenames. I want to remove the text that appears after a specific series of characters that I know will always appear in the filename. But I do not know how many characters will appear before or after the known series of characters. I also don't know if the before or after text will contain multiple periods or dashes. An example would probably help:
The.Big.Bang.Theory.S01E01.xxxxxxxx.mp4
I know that each file will always contain a string in the format of SxxExx where the x's are always numbers. But I do not know what the numbers will be. I want to get the filename up to and including the SxxExx string and the file extension but strip out everything else. So for the above example I would end-up with:
The.Big.Bang.Theory.S01E01.mp4
I have tried using bash's built-in string replacement commands. I thought the expr index command would give me the integer start point of the SxxExx string and then I could use ${filename:offset:length} to extract only the required part of the filename (i already know the extension so that can be re-added). But it seems the OS X version of expr doesn't include the index functionality. I have only scripted in Basic and LotusScript before. In those environments this would have been fairly easy using commands such as 'Like' and 'Instr' or 'Mid'. But in bash I just can't figure it out. I have spent hours googling trying to understand how to use regular expressions to locate the 'SxxExx' substring in the filename but I just can't figure it out. I hope some clever UNIX scripters will be able to help me!
Best Answer
How does this work? First
ls
outputs the list of files, one per line, like so:Then
perl -nl
splits this into lines, feeding each to the regex, much like awk*. The regex captures 3 groups (denoted by parentheses), first the bit before SxxEyy, then that, then the file suffix. It then simply assembles amv
command suitable for renaming the files, like so:This can then be inspected and once you're satisfied it does what you want, piped into a shell by appending:
| sh
.*awk would normally be a good tool to use for this, but sadly only GNU awk supports regex capture groups and Mac OS X doesn't include gawk by default.