Using a table or script in sed to replace many special characters with escape characters

command linesed

If u want replace special characters using sed you can use different ways, but the problem is you have to replace many (100+) special characters with escape characters in many files.

so it needs: (thanks Peter)

^^ to escape a single ^
^| to escape |
\& to escape &
\/ to escape /
\\​ to escape \

Suppose to have 100+ strings examples in many files:

sed.exe -i "s/{\*)(//123/
sed -i "s/\\/123/g;" 1.txt
sed.exe -i "s/{\*)(//123/
sed -i "s/\\/123/g;" 1.txt
.....
.....

these strings containing many special characters to escape (we have 100+ strings)..
Escaping manually is a very long work..so i need create a table script similar to wReplace to call in command prompt for escaping special characters and then replacing them with my words.
How can i do?

Best Answer

Note that ^^ for ^, and ^| for |, and ^& for &... are not a requirement of sed. The ^ escape-character is required by the CMD-shell. If your text is exposed to neither the command-line nor a command parameter in a .cmd/.bat command-script, you only need to consider sed's escape-character which is a backslash \​ ... They are two quite seperate scopes (which can overlap, so it is often better to keep it all withn sed's scope, as the following does.

Here is a sed script which will replace any number of find-strings you sepcify, with their complementary replacement-string. The general format of the strings is a cross between a sed substitution-command (s/abc/xyz/p) and a tabular format. You can "stretch" the middle delimiter so that you can line things up.
You can use a FIXED string pattern (F/...), or a normal sed-style regular expression pattern (s/...)... and you can adjust sed -n and each /p(in table.txt) as needed.

You need 3 files for a minimal run (and a 4th, dynamically derrived from table.txt):

  1. the main script   table-to-regex.sed
  2. the table file       table.txt
  3. the target file     file-to-chanage.txt
  4. derrived script    table-derrived.sed

To run one table against one target file.

sed -nf table-to-regex.sed  table.txt > table-derrived.sed
# Here, check `table-derrived.sed` for errors as described in the example *table.txt*.  

sed -nf table-derrived.sed  file-to-change.txt
# Redirect *sed's* output via `>` or `>>` as need be, or use `sed -i -nf` 

If you want to run table.txt against many files, just put the above code snippet into a simple loop to suit your requirements. I can do it trivially in bash, but someone more aware of the Windows CMD-shell would be better suited than I to set that up.


Here is the script: table-to-regex.sed

s/[[:space:]]*$//  # remove trailing whitespace

/^$\|^[[:space:]]*#/{p; b}  # empty and sed-style comment lines: print and branch
                            # printing keeps line numbers; for referencing errors

/^\([Fs]\)\(.\)\(.*\2\)\{4\}/{  # too many delims ERROR
      s/^/# error + # /p        # print a flagged/commented error
      b }                       # branch

/^\([Fs]\)\(.\)\(.*\2\)\{3\}/{                  # this may be a long-form 2nd delimiter
   /^\([Fs]\)\(.\)\(.*\2[[:space:]]*\2.*\2\)/{  # is long-form 2nd delimiter OK?
      s/^\([Fs]\)\(.\)\(.*\)\2[[:space:]]*\2\(.*\)\2\(.*\)/\1\2\n\3\n\4\n\5/
      t OK                                      # branch on true to :OK
   }; s/^/# error L # /p                        # print a flagged/commented error
      b }                                       # branch: long-form 2nd delimiter ERROR

/^\([Fs]\)\(.\)\(.*\2\)\{2\}/{     # this may be short-form delimiters
   /^\([Fs]\)\(.\)\(.*\2.*\2\)/{   # is short-form delimiters OK?
      s/^\([Fs]\)\(.\)\(.*\)\2\(.*\)\2\(.*\)/\1\2\n\3\n\4\n\5/
      t OK                         # branch on true to :OK  
   }; s/^/# error S # /p           # print a flagged/commented error
      b }                          # branch: short-form delimiters ERROR

{ s/^/# error - # /p        # print a flagged/commented error
  b }                       # branch: too few delimiters ERROR

:OK     # delimiters are okay
#============================
h   # copy the pattern-space to the hold space

# NOTE: /^s/ lines are considered to contain regex patterns, not FIXED strings.
/^s/{    s/^s\(.\)\n/s\1/   # shrink long-form delimiter to short-form
     :s; s/^s\(.\)\([^\n]*\)\n/s\1\2\1/; t s  # branch on true to :s 
      p; b }                                  # print and branch

# The following code handles FIXED-string /^F/ lines

s/^F.\n\([^\n]*\)\n.*/\1/  # isolate the literal find-string in the pattern-space
s/[]\/$*.^|[]/\\&/g        # convert the literal find-string into a regex of itself
H                          # append \n + find-regex to the hold-space

g   # Copy the modified hold-space back into the pattern-space

s/^F.\n[^\n]*\n\([^\n]*\)\n.*/\1/  # isolate the literal repl-string in the pattern-space
s/[\/&]/\\&/g                      # convert the literal repl-string into a regex of itself
H                                  # append \n + repl-regex to the hold-space

g   # Copy the modified hold-space back into the pattern-space

# Rearrange pattern-space into a / delimited command: s/find/repl/...      
s/^\(F.\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)$/s\/\5\/\6\/\4/

p   # Print the modified find-and-replace regular expression line

Here is an example table file, with a description of how it works: table.txt

# The script expects an input table file, which can contain 
#   comment, blank, and substitution lines. The text you are
#   now reading is part of an input table file.

# Comment lines begin with optional whitespace followed by #

# Each substitution line must start with: 's' or 'F'
#  's' lines are treated as a normal `sed` substitution regular expressions
#  'F' lines are considered to contain `FIXED` (literal) string expressions 
# The 's' or 'F' must be followed by the 1st of 3 delimiters   
#   which must not appear elsewhere on the same line.
# A pre-test is performed to ensure conformity. Lines with 
#   too many or too few delimiters, or no 's' or 'F', are flagged   
#   with the text '# error ? #', which effectively comments them out.
#   '?' can be: '-' too few, '+' too many, 'L' long-form, 'S' short-form
#   Here is an example of a long-form error, as it appears in the output. 

# error L # s/example/(7+3)/2=5/

# 1st delimiter, eg '/' must be a single character.
# 2nd (middle) delimiter has two possible forms:
#   Either it is exactly the same as the 1st delimiter: '/' (short-form)
#   or it has a double-form for column alignment: '/      /' (long-form)
#   The long-form can have any anount of whitespace between the 2 '/'s   
# 3rd delimiter must be the same as the 1st delimiter,

# After the 3rd delimiter, you can put any of sed's 
#    substitution commands, eg. 'g'

# With one condition, a trailing '#' comment to 's' and 'F' lines is
#    valid. The condition is that no delimiter character can be in the 
#    comment (delimiters must not appear elsewhere on the same line)

# For 's' type lines, it is implied that *you* have included all the 
#    necessary sed-escape characters!  The script does not add any 
#    sed-escape characters for 's' type lines. It will, however, 
#    convert a long-form middle-delimiter into a short-form delimiter.   

# For 'F' type lines, it is implied that both strings (find and replace) 
#    are FIXED/literal-strings. The script does add the  necessary 
#    sed-escape characters for 'F' type lines. It will also 
#    convert a long-form middle-delimiter into a short-form delimiter.   

# The result is a sed-script which contains one sed-substitution 
#    statement per line; it is just a modified version of your 
#    's' and 'F' strings "table" file.

# Note that the 1st delimiter is *always* in column 2.

# Here are some sample 's' and 'F' lines, with comments:
#

F/abc/ABC/gp               #-> These 3 are the same for 's' and 'F', 
s/abc/ABC/gp               #-> as no characters need to be escaped,  
s/abc/         /ABC/gp     #-> and the 2nd delimiter shrinks to one  

F/^F=Fixed/    /\1okay/p   # \1 is okay here, It is a FIXED literal
s|^s=sed regex||\1FAIL|p   # \1 will FAIL: back-reference not defined!

F|\\\\|////|               # this line == next line 
F|\\\\|        |////|p     # this line == previous line  
s|\\\\|        |////|p     # this line is different; 's' vs 'F'

F_Hello! ^.&`//\\*$/['{'$";"`_    _Ciao!_   # literal find / replace    

Here is a sample input file whose text you wish to change: file-to-chanage.txt

abc abc
^F=Fixed
   s=sed regex
\\\\ \\\\ \\\\ \\\\
Hello! ^.&`//\\*$/['{'$";"`
some non-matching text
Related Question