Sed -e ‘s/^[0-9]//’ does not work for the first line

regular expressionsedtext processing

The following is the text I want to parse with sed (Mac OS X 10.11.1 bash)

1
00:25:43,959 --> 00:25:46,502
Here you are, sir.
Main level, please.

I can delete the first line with sed -e 's/[0-9]//'.

But with sed -e 's/^[0-9]//', the first line, i.e. 1 remains there.
Since 1 is at the beginning of the first line, shouldn't it be deleted?

head -n1 2001.srt | od -c

0000000  357 273 277   1  \n
0000005

Just created a new text file starting with "1".
head -n1 2002.srt | od -c

0000000    1  \n
0000002

sed -e 's/^[0-9]//' works for this newly created file.

Yes, there's something before "1".

Best Answer

Your file starts with a UTF-8 byte order mark. It is unicode symbol U+FEFF which is encoded as three bytes in UTF-8. Those three bytes show up as 357 273 277 when you print them in base 8.

To the sed command those bytes at the start of the line means that 1 is in fact not the first character on that line. Many other tools will treat it the same way.

You need to remove the BOM before doing other processing in order to get a useful result. For instance you could start your sed script with s/^\xef\xbb\xbf// to remove the BOM. Your full command would then become

sed -e 's/^\xef\xbb\xbf//;s/^[0-9]//'