Using sed to remove digits and white space from a string

regexsed

I am trying to remove the first occurence of digit(s), the dot, the second occurence of digit(s) and the space before the word.

I have come up with this regex:

sed 's/^[0-9]\+.[0-9]\+\s//' input.txt > output.txt

Text sample:

2.14 Italien
2.15 Japonais

My regex does not work unfortunately. There is a problem with the \s but I can't pinpoint what it is…

Can anyone help?

edit: The problem is that I need to remove the first space only as some text contain spaces as you can see below:

3.15 Chichewa
3.16 Chimane
3.17 Cinghalais
3.18 Créole de Guinée-Bissau

Best Answer

The command you're using should work as-is with GNU sed. But with BSD sed, which for example comes with OS X, it won't.

  • If you're trying to use Extended Regular Expressions – which support the + metacharacter – you need to explicitly enable them. For BSD sed you do this with sed -E, and for GNU sed with sed -r.

    The \+ alone does with GNU sed when EREs are not enabled, but this is less portable.

  • You're using the Perl-like \s, which doesn't exist for both Basic and Extended Regular Expressions. Regular sed doesn't support Perl regular expressions though. GNU sed does support the \s – but it'd be more portable to simply add the space to your regular expression.

  • Finally, your . matches one character, so your regex would even match any character in that place, not just a dot. Use \. to properly escape it.

So, a solution would be, for GNU sed:

$ echo "2.12 blah" | sed -r 's/^[0-9]+\.[0-9]+ //'
blah

Or for BSD sed:

$ echo "2.12 blah" | sed -E 's/^[0-9]+\.[0-9]+ //'
blah

This way you don't need a different regex for different versions of sed. With your example:

$ cat test
3.15 Chichewa
3.16 Chimane
3.17 Cinghalais
3.18 Créole de Guinée-Bissau

$ sed -r 's/^[0-9]+\.[0-9]+ //' test
Chichewa
Chimane
Cinghalais
Créole de Guinée-Bissau

If the real problem is that you want to get the second column of a whitespace-delimited file, then you're going about this the wrong way. Either use awk, like @Srdjan Grubor says, or use cut:

$ echo "2.12 foo bar baz" | cut -d' ' -f2-
foo bar baz

The -f2- specifies the second and all following columns, so this will basically take the first space as the separator and output the rest.

Related Question