Sed or tr one-liner to delete all numeric digits

regular expressionsedtext processing

So, I have this textfile, and it consists of mostly alphanumeric characters. It's a standard document. But since I copied it and pasted it from a PDF, there are page numbers in there. I don't much care for the occasional number that's not a page, so I figure I'll wipe them all out with sed or tr. Just marginally faster than find and replacing first zero, then one, then two, etc. in the GUI, after all.

So how do I do that?

Best Answer

To remove all digits, here are a few possibilities:

tr -d 0-9 <old.txt >new.txt
tr -d '[:digit:]' <old.txt >new.txt
sed -e 's/[0-9]//g' <old.txt >new.txt

If you just want to get rid of the page numbers, there's probably a better regexp you can use, to recognize just those digits that are page numbers. For example, if the page numbers are always alone on a line except for whitespace, the following command will delete just the lines containing nothing but a number surrounded by whitespace:

sed -e '/^ *[0-9]\+ *$/d' <old.txt >new.txt

(\+ is a GNU extension; with some sed implementations, you may need the longer standard alternative: \{1,\} or use [0-9][0-9]*).

You don't need to use the command line for this, though. Any halfway decent editor has regexp search and replacement capabilities.

Related Question