Sed, convert single backslash to double backslash

regular expressionsedtext processing

I have a json string, which has a potpourri of doubly escaped/singly escaped newline chars. Json parser doesn't allow its string value to have single backslash escapes.

I need to uniformly make all of them to double escapes

Content looks like,

this newline must not be changed ---- \\n
this newline must be changed - \n

When i run sed command,

 sed -i 's/([^\])\n/\\n/g' ~/Desktop/sedTest

it is not replacing anything

([^\]), this pattern is used to not change \n that already has one more backslash.

Best Answer

try

sed -i 's,\([^\\]\)\\n,\1\\\\n,'  file
sed -i 's,\([^\]\)\\n,\1\\\\n,'  file

where

\ must be escaped by \\\\
$ .. $ is the capture pattern
\1 on right hand is the first captured pattern.
second form with a single \ in [^\] as per @cuonglm suggestion.

You need to keep the pattern, or it will be discarded.

Related Solutions

How to keep a part of the pattern matched and use it to replace on BSD sed

sed 's,\([a-z]\)1\.gif$,\1.gif,g'

or, if you want to allow any non-digit before the 1

sed 's,\([^0-9]\)1\.gif$,\1.gif,g'

The backslash-parenthesis construct delimits a capture group, which the FreeBSD man page calls a “bracket expression” (despite the use of parentheses — square brackets mean something else). Note that sed uses basic regular expressions (BRE), not extended regular expressions (ERE); the man page describes ERE, and the last paragraph explains the difference between BRE syntax and ERE syntax. I find the POSIX specification more readable than the BSD man page here; it calls capture groups back-reference expressions. The GNU sed manual is more readable than either; just avoid the features described as GNU extensions.

Given a capture group (a.k.a. back-reference expression), you can use backslash+digit in the replacement text to mean “the text matched by the corresponding capture group”. For example, \1 in the replacement text is replaced by the text matched by the first capture group in the regular expression. Here there's a single capture group, which captures the letter before 1.gif.

I changed 1.gif to 1\.gif to match the dot literally, and added a trailing $ to match only at the end of the line.

To give another example of capture groups, if you wanted to operate on arbitrary extensions, you could use something like

sed 's,\([^0-9]\)1\(\.[^./]*\)$,\1\2,g'

Grab text out of vtt file

Since your file appears to consist of a sequence of records separated by one or more blank lines, I'd suggest trying something based on the paragraph modes of either awk or perl.

For example, if you always need to strip off the first two lines, like

1
00:00:00.096 --> 00:00:05.047

you could split into newline-delimited fields within blank-separated paragraphs and skip the first two fields using either

awk -vRS= -vORS= -F'\n' '{for(j=3;j<=NF;j++) print $j; print " "}' file.vtt

perl -F'\n' -00ne 'print join("", @F[2..$#F]), " "' file.vtt

If you can't rely on there being a fixed number of fields (lines) to be removed, then it's fairly easy to add a regular expression test - a little easier in perl since it allows us to grep directly on arrays rather than writing an explicit loop. For example, to split into blank-separated records and then print only those fields (lines) having at least one sequence of at least 3 alphabetic characters, you could use

perl -F'\n' -00ane '
  print join("", grep { /[[:alpha:]]{3}/ } @F), " "
' file.vtt

If you want to exclude the WEBVTT string you can simply skip the first record, i.e.

perl -F'\n' -00ane '
  print join("", grep { /[[:alpha:]]{3}/ } @F), " " if $. > 1
  ' file.vtt

It will be down to you to choose a suitable regex that capture the wanted lines and excludes the unwanted ones. You can add an END block in either awk or perl if you want to add a final newline to the concatenated output.

NOTE: since (based on the discussion in comments) your files appear to have DOS-style CRLF line endings, you will need to deal with those - either by modifying the field and record separators in the above commands accordingly, or by stripping out the CRs first e.g.

sed 's/\r$//' file.vtt | 
  perl -F'\n' -00ane '
    print join("", grep { /[[:alpha:]]{3}/ } @F), " " if $. > 1
  '
you're the four functions if you would of management first of all you have the planning the planning stages basically you were choosing appropriate  organizational goals and courses action to best achieve those goals steeldriver@xenial-vm:~/test/$

Best Answer

Related Solutions

How to keep a part of the pattern matched and use it to replace on BSD sed

Grab text out of vtt file

Related Question