How to strip a filename of special characters

bashterminalunix

I have a file with a name such as "Today's Date.txt"

What I am interested in is stripping away all special characters using the terminal such as:

" - , ' ' [ 

The reason for this is because I plug these into a script later and it is too much of a headache accounting for and changing the names individually.

Spaces " " and Underscores "_" and the alphabet "A-Z, a-z" are ok and so is replacing the characters rather than erasing them.

At first I thought the terminal command "iconv" could help me by converting to a simpler encoding, but I tried out several of the encodings and it seems I might be mistaken.

I know regular expressions might help me but sadly I am not well versed in them. I found this question that seems realated, but I dont know how to implement it or if it covers the same cases as mine.

The reason I posted this here is because this question might be unique to the character set OSX supports for filenames and the encoding it uses…although it's more likely I have no clue what I'm talking about.

Thank you for your help in advance.

Edit: The command

sed 's/[!@#\$%^&*()]//g'

Seems to work very well but I can't get it to work for my original usage case and others:

' ` "

Escaping them doesnt work either. I'm very new to bash scripting so please bear with me.


Edit 2: Posting this here or else I'd have to wait 6 hours.

In addition to Alan Shutko's Answer, I would like to add my own solution that I found.

awk '{gsub(/[[:punct:]]/,"")}1'

I'm kind of hesitant to post this since I cannot explain it well.

Awk, as it's man page says, is used for "pattern-directed scanning and processing language". The gsub function searches and replaces all occurrences of the regular expression you input. The gsub part would look like this:

gsub("a","b")

Where in my example, all occurences of a would be replaced by b. Like in the comment above [[:punct:]] sounds like it stands for all punctuation marks. However, I do not know what the 1 on the outside of the brackets stands for.

Best Answer

If you have a specific set of characters that you want to keep, tr works very well.

For example

tr -cd 'A-Za-z0-9_-'

Will remove any characters not in the set of characters listed. (The -d means delete, and the -c means the complement of the characters listed: in other words, any character not listed gets deleted.)