Sed -e ‘s/^[0-9]//’ does not work for the first line

regular expressionsedtext processing

The following is the text I want to parse with sed (Mac OS X 10.11.1 bash)

1
00:25:43,959 --> 00:25:46,502
Here you are, sir.
Main level, please.

I can delete the first line with sed -e 's/[0-9]//'.

But with sed -e 's/^[0-9]//', the first line, i.e. 1 remains there.
Since 1 is at the beginning of the first line, shouldn't it be deleted?

head -n1 2001.srt | od -c

0000000  357 273 277   1  \n
0000005

Just created a new text file starting with "1".
head -n1 2002.srt | od -c

0000000    1  \n
0000002

sed -e 's/^[0-9]//' works for this newly created file.

Yes, there's something before "1".

Best Answer

Your file starts with a UTF-8 byte order mark. It is unicode symbol U+FEFF which is encoded as three bytes in UTF-8. Those three bytes show up as 357 273 277 when you print them in base 8.

To the sed command those bytes at the start of the line means that 1 is in fact not the first character on that line. Many other tools will treat it the same way.

You need to remove the BOM before doing other processing in order to get a useful result. For instance you could start your sed script with s/^\xef\xbb\xbf// to remove the BOM. Your full command would then become

sed -e 's/^\xef\xbb\xbf//;s/^[0-9]//'

Related Solutions

Bash – Need to insert single quotes in text file for use as SQL query using sed

There are four ways to include the single quote that you need.

One cannot escape a single-quotes string within a single-quoted string. However, one can end the quoted string, insert an escaped single-quote, and then start a new single-quoted string. Thus, to put a single quote in the middle of 'ab', use: 'a'\''b'. Or, using the sed command that you need:

$ sed -r 's/,([^ ),]+)/,'\''\1'\''/g; s/,,/,'\'\'',/g' file
INSERT INTO radcheck(id, username, attribute, op, value) VALUES (,'','00:23:32:c2:a9:e8','Auth-Type',':=','Accept');

The second way is to use a double-quoted string, in which case the single-quote can be inserted easily:

$ sed -r "s/,([^ ),]+)/,'\1'/g; s/,,/,'',/g" file
INSERT INTO radcheck(id, username, attribute, op, value) VALUES (,'','00:23:32:c2:a9:e8','Auth-Type',':=','Accept');

This issue with double-quoted strings is that the shell does processing on them. Here, though, there are no shell-active characters, so it is easy.

The third method is to use a hex escape as PM2Ring demonstrates.

The fourth way, suggested in the comments by Jonathan Leffler, is to place the sed commands in a separate file:

$ cat script.sed 
s/,([^ ),]+)/,'\1'/g
s/,,/,'',/g
$ sed -rf script.sed file
INSERT INTO radcheck(id, username, attribute, op, value) VALUES (,'','00:23:32:c2:a9:e8','Auth-Type',':=','Accept');

This way has the strong advantage that sed reads the commands directly without any interference from the shell. Consequently, this completely avoids the need to escape shell-active characters and allows the commands to be entered in pure sed syntax.

How the `sed` solution works

The trick is to put single quotes around the comma-separated strings that you want but not around the others. Based on the single example that you gave, here is one approach:

s/,([^ ),]+)/,'\1'/g

This looks for one or more non-space, non-comma, and non-close-parens characters which follow a comma. These characters are placed inside single quotes.
s/,,/,'',/g

This looks for consecutive commas and places a two single-quotes between them.

OSX and other BSD platforms

To avoid extra backslashes, the above sed expressions use extended regular expressions. With GNU, these are invoked as -r but, with BSD, they are invoked with -E. Also, some non-GNU sed do not accept multiple commands separated with semicolons. Thus, on OSX, try:

sed -E -e "s/,([^ ),]+)/,'\1'/g" -e "s/,,/,'',/g" file

Addendum: Matching a MAC address

From the comments, we have the following input;

$ cat file3
 INSERT INTO radcheck(username, attribute, op, value) VALUES (00:23:32:c2:a9:e8,'Auth-Type',':=','Accept');

And, we want to put single-quotes around the MAC address that follows the open-parens. To do that:

$ sed -r "s/\(([[:xdigit:]:]+)/('\1'/" file3
 INSERT INTO radcheck(username, attribute, op, value) VALUES ('00:23:32:c2:a9:e8','Auth-Type',':=','Accept');

In any locale, [:xdigit:] will match any hexadecimal digit. Thus, ([[:xdigit:]:]+) will match a MAC address (hex digit or colon).

Why is sed changing permissions of a file on a cifs mounted share

Not sure it's related: I remember AD users have an option to specify "Main group" in user membership "if you have mac clients or posix apps"