I'm not sure exactly what you mean by "special characters", so I'm going to assume that you want to get rid of non-ASCII characters. There are a few different tools that might work for you. The first few that come to mind for me are:
iconv
(internationalization conversion)
tr
(translate)
sed
(stream editor)
iconv
(internationalization conversion)
Here is a solution using iconv:
iconv -c -f utf-8 -t ascii input_file.csv
The -f
flag (from) specifies an input format, the -t
flag (to) specifies an output format, and the -c
flag tells iconv
to discard characters that cannot be converted to the target. This writes the results to standard output (i.e. to your console). If you want to write the results to a new file you would do something like this instead:
iconv -c -f utf-8 -t ascii input_file.csv -o output_file.csv
Then, if you want, you can replace the original file with the new file:
mv -i output_file.csv input_file.csv
Here is how iconv
handles your first example string:
$ echo "'÷ÞW' , 'ŸŸŸŸŸŸŸ', '³ŸŸÙ÷'" | iconv -c -f utf8 -t ascii
'W' , '', ''
tr
(translate)
Here is a solution using the tr (translate) command:
cat input_file.csv | tr -cd '\000-\177'
The \000-\177
pattern specifies the numerical range 0-127 using octal notation. This is the range of values for ASCII characters. The -c
flag tells tr
to match values in the complement of this range (i.e. to match non-ASCII characters) and the -d
flag tells tr
perform deletion (instead of translation).
To write the results to a file you would use output redirection:
cat input_file.csv | tr -cd '\000-\177' > output_file.csv
Here is how tr
handles your first example string:
$ echo "'÷ÞW' , 'ŸŸŸŸŸŸŸ', '³ŸŸÙ÷'" | tr -cd '\000-\177'
'W' , '', ''
sed
(stream editor)
Here is a solution using sed:
sed 's/[\d128-\d255]//g' input_file.csv
The s
prefix tells sed
to perform substitution, the g
suffix tells sed
to match patterns globally (by default only the first occurrence is matched), the pattern [\d128-\d255]
tells sed
to match characters with decimal values in the range 128-255 (i.e. non-ASCII characters), and the empty string between the second and third forward-slashes tells sed
to replace matched patterns with the empty string (i.e. to remove them).
Unlike many other programs, sed
has an option to update the file in-place (instead of manually writing to a different file and then replacing the original):
sed -i 's/[\d128-\d255]//g' input_file.csv
Here is how sed
handles your first example string:
$ echo "'÷ÞW' , 'ŸŸŸŸŸŸŸ', '³ŸŸÙ÷'" | sed 's/[\d128-\d255]//g'
'W' , '', ''
Best Answer
If the first character of file name is printable but neither alphanumeric nor whitespace you can use
[[:punct:]]
glob operator: