Using sed to remove digits and white space from a string

regexsed

I am trying to remove the first occurence of digit(s), the dot, the second occurence of digit(s) and the space before the word.

I have come up with this regex:

sed 's/^[0-9]\+.[0-9]\+\s//' input.txt > output.txt

Text sample:

2.14 Italien
2.15 Japonais

My regex does not work unfortunately. There is a problem with the \s but I can't pinpoint what it is…

Can anyone help?

edit: The problem is that I need to remove the first space only as some text contain spaces as you can see below:

3.15 Chichewa
3.16 Chimane
3.17 Cinghalais
3.18 Créole de Guinée-Bissau

Best Answer

The command you're using should work as-is with GNU sed. But with BSD sed, which for example comes with OS X, it won't.

If you're trying to use Extended Regular Expressions – which support the + metacharacter – you need to explicitly enable them. For BSD sed you do this with sed -E, and for GNU sed with sed -r.

The \+ alone does with GNU sed when EREs are not enabled, but this is less portable.
You're using the Perl-like \s, which doesn't exist for both Basic and Extended Regular Expressions. Regular sed doesn't support Perl regular expressions though. GNU sed does support the \s – but it'd be more portable to simply add the space to your regular expression.
Finally, your . matches one character, so your regex would even match any character in that place, not just a dot. Use \. to properly escape it.

So, a solution would be, for GNU sed:

$ echo "2.12 blah" | sed -r 's/^[0-9]+\.[0-9]+ //'
blah

Or for BSD sed:

$ echo "2.12 blah" | sed -E 's/^[0-9]+\.[0-9]+ //'
blah

This way you don't need a different regex for different versions of sed. With your example:

$ cat test
3.15 Chichewa
3.16 Chimane
3.17 Cinghalais
3.18 Créole de Guinée-Bissau

$ sed -r 's/^[0-9]+\.[0-9]+ //' test
Chichewa
Chimane
Cinghalais
Créole de Guinée-Bissau

If the real problem is that you want to get the second column of a whitespace-delimited file, then you're going about this the wrong way. Either use awk, like @Srdjan Grubor says, or use cut:

$ echo "2.12 foo bar baz" | cut -d' ' -f2-
foo bar baz

The -f2- specifies the second and all following columns, so this will basically take the first space as the separator and output the rest.

Related Solutions

How to extract a version number using sed

Try next 'sed' command:

$ echo "Version 1.2.4.1 (release mode)" | sed -ne 's/[^0-9]*\(\([0-9]\.\)\{0,4\}[0-9][^.]\).*/\1/p'
1.2.4.1

It uses the {i,j} syntax, which selects the expression a number of times between the first and the last number. There souldn't be any numbers in the string before the version number.

Another examples:

$ echo "Version 1.2.4.1.6 (release mode)" | sed -ne 's/[^0-9]*\(\([0-9]\.\)\{0,4\}[0-9][^.]\).*/\1/p'
1.2.4.1.6 
$ echo "Version 1.2 (release mode)" | sed -ne 's/[^0-9]*\(\([0-9]\.\)\{0,4\}[0-9][^.]\).*/\1/p'
1.2 
$ echo "Version 1.2. (release mode)" | sed -ne 's/[^0-9]*\(\([0-9]\.\)\{0,4\}[0-9][^.]\).*/\1/p'
$

EDIT to comments:

$ echo "Version 1.2.4.1.90 (release mode)" | sed -nre 's/^[^0-9]*(([0-9]+\.)*[0-9]+).*/\1/p'

How to match digits followed by a dot using sed

Because sed is not perl -- sed regexes do not have a \d shorthand:

sed 's/[[:digit:]]\+\.//g'

sed regular expression documentation here.

Best Answer

Related Solutions

How to extract a version number using sed

How to match digits followed by a dot using sed

Related Question