That's a typical job for tr
:
LC_ALL=C tr '\0-\10\13\14\16-\37' '[ *]' < in > out
In your case, it doesn't work with sed
because you're in a locale where those ranges don't make sense. If you want to work with byte values as opposed to characters and where the order is based on the numerical value of those bytes, your best bet is to use the C locale. Your code would have worked with LC_ALL=C
with GNU sed
, but using sed
(let alone perl
) is a bit overkill here (and those \xXX
are not portable across sed
implementations while this tr
approach is POSIX).
You can also trust your locale's idea of what printable characters are with:
tr -c '[:print:]\t\r\n' '[ *]'
But with GNU tr
(as typically found on Linux-based systems), that only works in locales where characters are single-byte (so typically, not UTF-8).
In the C locale, that would also exclude DEL (0x7f) and all byte values above (not in ASCII).
In UTF-8 locales, you could use GNU sed
which doesn't have the problem GNU tr
has:
sed 's/[^[:print:]\r\t]/ /g' < in > out
(note that those \r
, \t
are not standard, and GNU sed
won't recognize them if POSIXLY_CORRECT
is in the environment (will treat them as backslash, r and t being part of the set as POSIX requires)).
It would not convert bytes that don't form valid characters if any though.
Like many, if not most, text parsing tools, perl
can take input from the command line, there's no need for cat
. You just need -e
which lets you pass a script as a command line parameter and -n
which means "run the script on each line of input". ALternatively, you can use the -p
switch which means "run the script on each line of input, then print that line". These two commands are equivalent (but the second is a classic useless use of cat, use the first) :
perl -pe 's/foo/bar/' file
cat file | perl -pe 's/foo/bar/'
Now, if I understand correctly, you want to delete all LaTeX comments (though that's not what your question states). If so, a lookbehind is the easiest way:
perl -pe 's/(?<!\\)%.*//' file
Your regex should also work, you just need to keep the character you matched before the %
and escape the backslash:
perl -pe 's/(^|[^\\]+)%.*/$1/' file
You can do the same thing with GNU sed
:
sed -r 's/(^|[^\\])%.*/\1/' file
Best Answer
You have a space after
\1
in your replacement, just remove that and you should be good