I have a file with a name such as "Today's Date.txt"
What I am interested in is stripping away all special characters using the terminal such as:
" - , ' ' [
The reason for this is because I plug these into a script later and it is too much of a headache accounting for and changing the names individually.
Spaces " " and Underscores "_" and the alphabet "A-Z, a-z" are ok and so is replacing the characters rather than erasing them.
At first I thought the terminal command "iconv" could help me by converting to a simpler encoding, but I tried out several of the encodings and it seems I might be mistaken.
I know regular expressions might help me but sadly I am not well versed in them. I found this question that seems realated, but I dont know how to implement it or if it covers the same cases as mine.
The reason I posted this here is because this question might be unique to the character set OSX supports for filenames and the encoding it uses…although it's more likely I have no clue what I'm talking about.
Thank you for your help in advance.
Edit: The command
sed 's/[!@#\$%^&*()]//g'
Seems to work very well but I can't get it to work for my original usage case and others:
' ` "
Escaping them doesnt work either. I'm very new to bash scripting so please bear with me.
Edit 2: Posting this here or else I'd have to wait 6 hours.
In addition to Alan Shutko's Answer, I would like to add my own solution that I found.
awk '{gsub(/[[:punct:]]/,"")}1'
I'm kind of hesitant to post this since I cannot explain it well.
Awk, as it's man page says, is used for "pattern-directed scanning and processing language". The gsub function searches and replaces all occurrences of the regular expression you input. The gsub part would look like this:
gsub("a","b")
Where in my example, all occurences of a would be replaced by b. Like in the comment above [[:punct:]] sounds like it stands for all punctuation marks. However, I do not know what the 1 on the outside of the brackets stands for.
Best Answer
If you have a specific set of characters that you want to keep, tr works very well.
For example
tr -cd 'A-Za-z0-9_-'
Will remove any characters not in the set of characters listed. (The -d means delete, and the -c means the complement of the characters listed: in other words, any character not listed gets deleted.)