Folder Action for automatic file name cleanup

applescriptautomatorfinderfolder-actionsmart-folders

I want to build a folder action that cleans up the filenames of my downloaded files.

For example Youtube_MyVideofile_(1080p_30fps_H264-128kbit_AAC).mp4 should be stripped by "Youtube" "30fps" "128kbit" "AAC" "(" ")" and "_" should be replaced to a "space". So the result would be MyVideofile 1080p H264.mp4

I know I could do this with Automator but then I have to set up a "search/replace" element for every word. I'd rather use a single list of words which would be easier to maintain, because I have a lot of different sources where I get files from on regular bases, so the actual list of words to be removed will be very long and may be updated from time to time.

I found this Automator or AppleScript to Remove Multiple Strings from File Names?
which is similar, but it only worked with selected folders. Instead I want to set it up so it works automatically as a folder action.

I guess therefore I also need a whitelist of file extensions that the script wont touch, such as ".download" for safari-downloads that are still in progress.

Best Answer

Using Automator, in macOS Sierra 10.12.5, I created a Folder Action with a single Run AppleScript action, using the AppleScript code below, and set it to run on my Downloads folder. (It has also been tested and works on OS X 10.8.5 and OSX 10.11.6.)

  1. Open Automator and select File > New, then Folder Action.
  2. Add a Run AppleScript action, replacing the default code with the code below.
  3. Modify the set theBlackWhiteList to POSIX path of ... line of code, accordingly as necessary.
  4. Set the Folder Action receives files and folders added to Choose folder list box to your Downloads folder.
  5. Before saving the Folder Action, create the plain text data file that will be used by this Folder Action.
    • It's not absolutely necessary to do it before, however, if you are going to save it in Downloads, I would create the file first.
  6. Save the Automator Folder Action workflow.

Read the comments, included with the code, for what's necessary to use this code in the Folder Action.

To test the Folder Action, open Terminal and cd Downloads, then create the test file with,
touch 'Youtube_MyVideofile_(1080p_30fps_H264-128kbit_AAC).mp4', which will create a zero length file that will be processed by the Folder Action and be renamed to MyVideofile 1080p H264.mp4 as shown in Downloads in Finder or Terminal with: ls -l My*.mp4

AppleScript code:

--  #   
--  #   The AppleScript code of this Folder Action requires a data file, which is laid out as follows:
--  #   
--  #   Lines 1 and 3 state what are on lines 2 and 4 respectively. (These lines are just reminders.)
--  #   
--  #   Line 2 must start with a single space character ' ', followed by the comma delimiter ','!
--  #   Line 2 must also not contain an underscore character '_' as it's used as a 'text item delimiter',
--  #   and all of them will be removed and replaced with a single space, as appropriate, in the last
--  #   part of the processing to form the final filename.
--  #   
--  #       This is used as part of the overall logic applied to creating the finished filename, so as to
--  #       only have a single space character between words of the filename, while ensuring the finished
--  #       filename does not start with nor have directly before the filename extension, a space character. 
--  #   
--  #   Line 2 is a list of strings that will be removed from the filename. (The Black List.)
--  #   Line 4 is a list of filename extensions of the file types that will be processed. (The White List.)
--  #   
--  #   Modify lines 2 and 4 as appropriate, while leaving the single space character at the start of line 2,
--  #   and do not include an underscore character in Line 2.
--  #   
--  #   Example contents of the plain text data file:
--  #   

--  #   # Do Not Remove This Line!: The next line contains a comma-delimited list of strings to be removed:
--  #    ,Youtube,30fps,128kbit,-,AAC,(,)
--  #   # Do Not Remove This Line!: The next line contains a comma-delimited list of file extensions to process:
--  #   mp4,mkv,avi,flv,flac

--  #   For the purposes of testing this script, the name of the data file used is 
--  #   "FileNameExtensionBlackWhiteCleanupList.txt", and is in the User's Downloads folder.
--  #   Obviously you can name it whatever you want and place it where appropriate access exists.
--  #   Modify the 'set theBlackWhiteList to POSIX path of ...' line of code, accordingly as necessary.


on run {input, parameters}
    try
        set theBlackWhiteList to POSIX path of (path to downloads folder) & "FileNameExtensionBlackWhiteCleanupList.txt"

        --  #   Make sure the data file exists and set its contents to the target variables.

        tell application "System Events"
            if (exists file theBlackWhiteList) then
                tell current application
                    set theBlackWhiteList to (read theBlackWhiteList)
                    set AppleScript's text item delimiters to {","}
                    set theStringsToRemoveList to text items of paragraph 2 of theBlackWhiteList as list
                    set theFileExtensionsList to text items of paragraph 4 of theBlackWhiteList as list
                    set AppleScript's text item delimiters to {}
                end tell
            else
                tell current application
                    activate
                    display dialog "The required file, " & quoted form of theBlackWhiteList & ", is missing!" & ¬
                        linefeed & linefeed & "Replace the missing file from backup." buttons {"OK"} ¬
                        default button 1 with title "File Not Found" with icon 0 -- (icon stop)
                    return
                end tell
            end if
        end tell

        --  # Process the target file(s) added to the target folder, that have the target filename extensions. 

        tell application "Finder"
            set theFileList to input
            repeat with thisFile in theFileList
                set theFileName to name of thisFile
                set theOriginalFileName to theFileName
                --  #   Get the filename extension of thisfile.
                set AppleScript's text item delimiters to {"."}
                set thisFileExtension to last text item of theFileName as string
                --  #   Only process if thisFileExtension is in theFileExtensionsList. 
                if theFileExtensionsList contains thisFileExtension then
                    repeat with i from 1 to count of theStringsToRemoveList
                        set AppleScript's text item delimiters to item i of theStringsToRemoveList
                        set theTextItems to text items of theFileName
                        set AppleScript's text item delimiters to {"_"}
                        set theFileName to theTextItems as string
                        set AppleScript's text item delimiters to {}
                    end repeat
                    --  #                       
                    --  #   Using the example filename in the OP, 'Youtube_MyVideofile_(1080p_30fps_H264-128kbit_AAC).mp4',
                    --  #   at this point in the processing it would be, '__MyVideofile__1080p___H264_____.mp4', and while one 
                    --  #   probably could continue to use AppleScript 'text items' and 'text item delimiters', nonetheless I can do
                    --  #   it easier using 'sed' to finish getting the final filename. This is also part of the reason I started the
                    --  #   'theStringsToRemoveList' with a single space character and do not allow an underscore character in Line 2.
                    --  #                       
                    tell current application
                        set theFileName to (do shell script "printf " & quoted form of theFileName & " | sed -E -e 's/[_]{2,}/_/g' -e 's/^_//' -e 's/_\\./\\./g' -e 's/_/ /g'")
                    end tell
                    --  #   Only change the filename if it has actually changed by the processing above.
                    --  #   There's no sense in telling Finder to name a file the same name it already is. 
                    if theFileName is not equal to theOriginalFileName then
                        try
                            set the name of thisFile to theFileName
                        end try
                    end if
                    --  #   At this point the final filename, using the example filename, would be 'MyVideofile 1080p H264.mp4'.
                    --  #   This assumes this filename didn't already exist and why the 'do shell script' command is within a 'try'
                    --  #   statement. Additional coding and logic could be applied to increment the filename if it already existed.
                end if
            end repeat
            set AppleScript's text item delimiters to {}
        end tell

    on error eStr number eNum
        set AppleScript's text item delimiters to {}
        display dialog eStr & " number " & eNum buttons {"OK"} default button 1 with icon caution
        return
    end try
end run

Example contents of the plain text data file used by the Folder Action:

# Do Not Remove This Line!: The next line contains a comma-delimited list of strings to be removed:
 ,Youtube,30fps,128kbit,-,AAC,(,)
# Do Not Remove This Line!: The next line contains a comma-delimited list of file extensions to process:
mp4,mkv,avi,flv,flac

The logic behind the renaming process:

Using the variable theStringsToRemoveList, which starts with a single space character followed by the comma-delimiter, in conjunction with the underscore character as the text item delimiter, turns all spaces along with all other strings to be removed, into underscores during the AppleScript's text items and text items delimiters portion of the code.

This is done so sed can be used to replace all concurrent underscore characters with a single underscore character, then remove the leading underscore, if it exists, followed by an underscore preceding the dot before the filename extension, if it exists, and finally all remaining single underscore characters are replaced with a single space character.

set theFileName to (do shell script "printf " & quoted form of theFileName & " | sed -E -e 's/[_]{2,}/_/g' -e 's/^_//' -e 's/_\\./\\./g' -e 's/_/ /g'")
  • set theFileName to - The variable theFileName will contain the output of the do shell script command.
  • do shell script "_command_" - Runs the command in a shell.
  • printf " & quoted form of theFileName & " | - Prints the value of the variable theFileName, and pipes | it to the sed command.

  • sed -E -e 's/[_]{2,}/_/g' -e 's/^_//' -e 's/_\\./\\./g' -e 's/_/ /g'

  • sed - Stream EDitor.

  • -E - Interpret regular expressions as extended (modern) regular expressions rather than basic regular expressions (BRE’s). The re_format(7) manual page fully describes both formats.
  • -e command - Append the editing commands specified by the command argument to the list of commands.
  • s/[_]{2,}/_/g
    • s - Substitute pattern flag.
    • [_]{2,} - Match a single character present in the list, matches the character _ literally (case sensitive).
    • {2,} - Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed (greedy).
    • /_/ - Replaces matched pattern with a single character _ literally (case sensitive).
    • g - Global pattern flag g modifier, matches all occurrences of the pattern, (doesn't return after first match).
  • s/^_//
    • ^ - Asserts position at start of the string.
    • _ - Matches the character _ literally (case sensitive).
    • // - Replaces the matched pattern with literally nothing.
  • s/_\\./\\./g
    • _ - Matches the character _ literally (case sensitive).
    • \\. - Matches the character . literally (case sensitive).
    • /\\./ - Replaces the matched pattern with the character . literally (case sensitive).
      • Note: The double back-slash \\ is necessary when use in a do shell script command, however, from the command line a single back-slash \ would be used to make the character that follows a literal . character, in this case.
  • s/_/ /g
    • Replaces the character _ literally, with a character literally (case sensitive).

Note that the info above is abbreviated in places, however, it should provide a bit of an understanding of what's happening.

On a added note, if you want to also ensure capitalization of each word in the filename, then replace the existing do shell script command with the do shell script command below, which has an added awk command that receives the output from sed to preform the capitalization. Note that I found this awk command on the Internet and tested it that it works, however, will not be adding an explanation of how it functions for lack of time.

set theFileName to (do shell script "printf " & quoted form of theFileName & " | sed -E -e 's/[_]{2,}/_/g' -e 's/^_//' -e 's/_\\./\\./g' -e 's/_/ /g' | awk '{for(i=1;i<=NF;i++){ $i=toupper(substr($i,1,1)) substr($i,2) }}1'")

Update to address .'s in the filename, per the comments.

In the plain text data file, on Line 2, add a ., after the leading space and its comma-delimiter. In other words, the first item in the list on Line 2 is a blank space followed by a comma-delimiter followed by . followed by a comma-delimiter and so on.

Add the following lines of code after the repeat loop that directly before the comment starting with -- # Using the example filename in the OP. ... which is above the tell current application ... do shell script block of code.

            set AppleScript's text item delimiters to {"_" & thisFileExtension}
            set theTextItems to text items of theFileName
            set AppleScript's text item delimiters to {"_"}
            set theFileName to (theTextItems as string) & "." & thisFileExtension
            set AppleScript's text item delimiters to {}    

By adding the ., to line 2 in the plain text data file, all . in the filename are replaced with _ in the original code. Then with the extra lines of code above, it replaces e.g. _mp4 with .mp4, or . and whatever the actual filename extension is.

Now when it gets to the do shell script command there is only the . for the filename extension and all the underscores are process out of the name as they should.

Obviously the way the original code is coded, underscores cannot be a part of the final filename, and this modification to the original code doesn't change that.