Using a table or script in sed to replace many special characters with escape characters

command linesed

If u want replace special characters using sed you can use different ways, but the problem is you have to replace many (100+) special characters with escape characters in many files.

so it needs: (thanks Peter)

^^ to escape a single ^
^| to escape |
\& to escape &
\/ to escape /
\\ to escape \

Suppose to have 100+ strings examples in many files:

sed.exe -i "s/{\*)(//123/
sed -i "s/\\/123/g;" 1.txt
sed.exe -i "s/{\*)(//123/
sed -i "s/\\/123/g;" 1.txt
.....
.....

these strings containing many special characters to escape (we have 100+ strings)..
Escaping manually is a very long work..so i need create a table script similar to wReplace to call in command prompt for escaping special characters and then replacing them with my words.
How can i do?

Best Answer

Note that ^^ for ^, and ^| for |, and ^& for &... are not a requirement of sed. The ^ escape-character is required by the CMD-shell. If your text is exposed to neither the command-line nor a command parameter in a .cmd/.bat command-script, you only need to consider sed's escape-character which is a backslash \ ... They are two quite seperate scopes (which can overlap, so it is often better to keep it all withn sed's scope, as the following does.

Here is a sed script which will replace any number of find-strings you sepcify, with their complementary replacement-string. The general format of the strings is a cross between a sed substitution-command (s/abc/xyz/p) and a tabular format. You can "stretch" the middle delimiter so that you can line things up.
You can use a FIXED string pattern (F/...), or a normal sed-style regular expression pattern (s/...)... and you can adjust sed -n and each /p(in table.txt) as needed.

You need 3 files for a minimal run (and a 4th, dynamically derrived from table.txt):

the main script table-to-regex.sed
the table file table.txt
the target file file-to-chanage.txt
derrived script table-derrived.sed

To run one table against one target file.

sed -nf table-to-regex.sed  table.txt > table-derrived.sed
# Here, check `table-derrived.sed` for errors as described in the example *table.txt*.  

sed -nf table-derrived.sed  file-to-change.txt
# Redirect *sed's* output via `>` or `>>` as need be, or use `sed -i -nf`

If you want to run table.txt against many files, just put the above code snippet into a simple loop to suit your requirements. I can do it trivially in bash, but someone more aware of the Windows CMD-shell would be better suited than I to set that up.

Here is the script: table-to-regex.sed

s/[[:space:]]*$//  # remove trailing whitespace

/^$\|^[[:space:]]*#/{p; b}  # empty and sed-style comment lines: print and branch
                            # printing keeps line numbers; for referencing errors

/^\([Fs]\)\(.\)\(.*\2\)\{4\}/{  # too many delims ERROR
      s/^/# error + # /p        # print a flagged/commented error
      b }                       # branch

/^\([Fs]\)\(.\)\(.*\2\)\{3\}/{                  # this may be a long-form 2nd delimiter
   /^\([Fs]\)\(.\)\(.*\2[[:space:]]*\2.*\2\)/{  # is long-form 2nd delimiter OK?
      s/^\([Fs]\)\(.\)\(.*\)\2[[:space:]]*\2\(.*\)\2\(.*\)/\1\2\n\3\n\4\n\5/
      t OK                                      # branch on true to :OK
   }; s/^/# error L # /p                        # print a flagged/commented error
      b }                                       # branch: long-form 2nd delimiter ERROR

/^\([Fs]\)\(.\)\(.*\2\)\{2\}/{     # this may be short-form delimiters
   /^\([Fs]\)\(.\)\(.*\2.*\2\)/{   # is short-form delimiters OK?
      s/^\([Fs]\)\(.\)\(.*\)\2\(.*\)\2\(.*\)/\1\2\n\3\n\4\n\5/
      t OK                         # branch on true to :OK  
   }; s/^/# error S # /p           # print a flagged/commented error
      b }                          # branch: short-form delimiters ERROR

{ s/^/# error - # /p        # print a flagged/commented error
  b }                       # branch: too few delimiters ERROR

:OK     # delimiters are okay
#============================
h   # copy the pattern-space to the hold space

# NOTE: /^s/ lines are considered to contain regex patterns, not FIXED strings.
/^s/{    s/^s\(.\)\n/s\1/   # shrink long-form delimiter to short-form
     :s; s/^s\(.\)\([^\n]*\)\n/s\1\2\1/; t s  # branch on true to :s 
      p; b }                                  # print and branch

# The following code handles FIXED-string /^F/ lines

s/^F.\n\([^\n]*\)\n.*/\1/  # isolate the literal find-string in the pattern-space
s/[]\/$*.^|[]/\\&/g        # convert the literal find-string into a regex of itself
H                          # append \n + find-regex to the hold-space

g   # Copy the modified hold-space back into the pattern-space

s/^F.\n[^\n]*\n\([^\n]*\)\n.*/\1/  # isolate the literal repl-string in the pattern-space
s/[\/&]/\\&/g                      # convert the literal repl-string into a regex of itself
H                                  # append \n + repl-regex to the hold-space

g   # Copy the modified hold-space back into the pattern-space

# Rearrange pattern-space into a / delimited command: s/find/repl/...      
s/^\(F.\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)$/s\/\5\/\6\/\4/

p   # Print the modified find-and-replace regular expression line

Here is an example table file, with a description of how it works: table.txt

# The script expects an input table file, which can contain 
#   comment, blank, and substitution lines. The text you are
#   now reading is part of an input table file.

# Comment lines begin with optional whitespace followed by #

# Each substitution line must start with: 's' or 'F'
#  's' lines are treated as a normal `sed` substitution regular expressions
#  'F' lines are considered to contain `FIXED` (literal) string expressions 
# The 's' or 'F' must be followed by the 1st of 3 delimiters   
#   which must not appear elsewhere on the same line.
# A pre-test is performed to ensure conformity. Lines with 
#   too many or too few delimiters, or no 's' or 'F', are flagged   
#   with the text '# error ? #', which effectively comments them out.
#   '?' can be: '-' too few, '+' too many, 'L' long-form, 'S' short-form
#   Here is an example of a long-form error, as it appears in the output. 

# error L # s/example/(7+3)/2=5/

# 1st delimiter, eg '/' must be a single character.
# 2nd (middle) delimiter has two possible forms:
#   Either it is exactly the same as the 1st delimiter: '/' (short-form)
#   or it has a double-form for column alignment: '/      /' (long-form)
#   The long-form can have any anount of whitespace between the 2 '/'s   
# 3rd delimiter must be the same as the 1st delimiter,

# After the 3rd delimiter, you can put any of sed's 
#    substitution commands, eg. 'g'

# With one condition, a trailing '#' comment to 's' and 'F' lines is
#    valid. The condition is that no delimiter character can be in the 
#    comment (delimiters must not appear elsewhere on the same line)

# For 's' type lines, it is implied that *you* have included all the 
#    necessary sed-escape characters!  The script does not add any 
#    sed-escape characters for 's' type lines. It will, however, 
#    convert a long-form middle-delimiter into a short-form delimiter.   

# For 'F' type lines, it is implied that both strings (find and replace) 
#    are FIXED/literal-strings. The script does add the  necessary 
#    sed-escape characters for 'F' type lines. It will also 
#    convert a long-form middle-delimiter into a short-form delimiter.   

# The result is a sed-script which contains one sed-substitution 
#    statement per line; it is just a modified version of your 
#    's' and 'F' strings "table" file.

# Note that the 1st delimiter is *always* in column 2.

# Here are some sample 's' and 'F' lines, with comments:
#

F/abc/ABC/gp               #-> These 3 are the same for 's' and 'F', 
s/abc/ABC/gp               #-> as no characters need to be escaped,  
s/abc/         /ABC/gp     #-> and the 2nd delimiter shrinks to one  

F/^F=Fixed/    /\1okay/p   # \1 is okay here, It is a FIXED literal
s|^s=sed regex||\1FAIL|p   # \1 will FAIL: back-reference not defined!

F|\\\\|////|               # this line == next line 
F|\\\\|        |////|p     # this line == previous line  
s|\\\\|        |////|p     # this line is different; 's' vs 'F'

F_Hello! ^.&`//\\*$/['{'$";"`_    _Ciao!_   # literal find / replace

Here is a sample input file whose text you wish to change: file-to-chanage.txt

abc abc
^F=Fixed
   s=sed regex
\\\\ \\\\ \\\\ \\\\
Hello! ^.&`//\\*$/['{'$";"`
some non-matching text

Related Solutions

Linux – Using sed to replace string with special characters in XML file

Putting shell variables in single quotes disables their interpretation. That's why your command has no effect.

$ echo  's/"$OLD_STRING"/"$NEW_STRING"/g'
s/"$OLD_STRING"/"$NEW_STRING"/g

It should be written like that:

sed -i "s/'$OLD_STRING'/'$NEW_STRING'/g" jboss-beans.xml

But then the variables are interpreted before calling sed and the again contain special characters:

$ echo  "s/'$OLD_STRING'/'$NEW_STRING'/g"
s/'<property name="webServiceHost">${jboss.bind.address}</property>'/'<!--<property name="webServiceHost">${jboss.bind.address}</property>-->'/g

For that reason sed has this special featur allowing to define the s/// command delimiters by simply using them, e.g.:

sed -i "s#'$OLD_STRING'#'$NEW_STRING'#g" jboss-beans.xml

Still your search expression contains special regexp characters, and using sed like this is just waste of its abilities. I would write the expression like this:

sed -i 's/\(<.*webServiceHost.*jboss.bind.address.*>\)/<!--\1-->/' jboss-beans.xml

Of course you can make the match string more or less specific according to your needs. There is also other nice feature that can help. sed allows to narrow editing operations to the lines matching a specific pattern. Your command could look like this:

sed -i '/webServiceHost/ s/^\(.*\)$/<!--\1-->/' jboss-beans.xml

Replace special text with sed

Looking for literal strings with a regular expression, when the search-string contains special characters, is sometimes not as simple as looking for patterns, but you can do it with a bit of juggling.

Note: The echo command must cater for CMD-special-characters, so it needs ^^ to escape a single ^ and ^| to escape | ... You don't need CMD's escape-character ^ if you type directly into the file.

Step 1: Create a file, named literal-srch-strings.txt, which containing the exact (unaltered) string to be replaced. There are 2 ways to create this file:

As a command issued at CMD's commandline, or as a command in a .cmd/.bat command-script.

echo sed -i^^/\\*$/$[{" ;"> literal-srch-strings.txt
Make literal-srch-strings.txt yourself, in your text editor.
In this case, you should not use the CMD-escape-character ^, so the line is has just one ^, not ^^ -- This is because you are bypassing the CMD-shell.
Here is what is needed in the .txt file (just as the filename says :)

sed -i^/\\*$/$[{" ;"

Step 2: Make a sed script, named str-to-regex.sed , to convert the string(s) into sed regex(s).
Note that the same issue of the CMD-escape-character ^ applies to this step, so again, there are 2 ways you can create the .sed file:

As a command:

echo s/[]\/$*.^^^|[]/\\^&/g; s/.*/s\/^&\/Ciao!\/g/> str-to-regex.sed
Using your text editor, make a file named str-to-regex.sed, containing:

s/[]\/$*.^|[]/\\&/g; s/.*/s\/&\/Ciao!\/g/

Step 3: Run the sed-script which converts the string into a sed regeular expression, and
send its output to another sed-script, replace-text.sed, which will make the actual replacement.

sed -f str-to-regex.sed  literal-srch-strings.txt > replace-text.sed

Step 4: Run replace-text.sed -- For the test we can use literal-srch-strings.txt as the input file, but you can, of course, use any input file.

sed -f replace-text.sed  literal-srch-strings.txt

Here is the output:

Ciao!

Best Answer

Related Solutions

Linux – Using sed to replace string with special characters in XML file

Replace special text with sed

Related Question