Try this (gawk is needed).
awk '{a=gensub(/.*#([0-9]+)(\").*/,"\\1","g",$0);if(a~/[0-9]+/) {gsub(/[0-9]+\"/,a+11"\"",$0);}print $0}' YourFile
Test with your example:
kent$ echo '(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 2" "#2")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)
'|awk '{a=gensub(/.*#([0-9]+)(\").*/,"\\1","g",$0);if(a~/[0-9]+/) {gsub(/[0-9]+\"/,a+11"\"",$0);}print $0}'
(bookmarks
("Chapter 1 Introduction 12" "#12"
("1.1 Problem Statement and Basic Definitions 13" "#13")
("Exercises 41" "#41")
("Notes and References 45" "#45"))
)
Note that this command won't work if the two numbers (e.g. 1" and "#1") are different. or there are more numbers in same line with this pattern (e.g. 23" ...32"..."#123") in one line.
UPDATE
Since @Tim (OP) said the number followed by "
in same line could be different, I did some changes on my previous solution, and made it work for your new example.
BTW, from the example I feel that it could be a table of content structure, so I don't see how the two numbers could be different. First would be the printed page number, and 2nd with # would be the page index. Am I right?
Anyway, you know your requirement best. Now the new solution, still with gawk (I break the command into lines to make it easier to read):
awk 'BEGIN{FS=OFS="\" \"#"}{if(NF<2){print;next;}
a=gensub(/.* ([0-9]+)$/,"\\1","g",$1);
b=gensub(/([0-9]+)\"/,"\\1","g",$2);
gsub(/[0-9]+$/,a+11,$1);
gsub(/^[0-9]+/,b+11,$2);
print $1,$2
}' yourFile
test with your new example:
kent$ echo '(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 23" "#2")
("Exercises 31" "#30")
("Notes and References 42" "#34"))
)
'|awk 'BEGIN{FS=OFS="\" \"#"}{if(NF<2){print;next;}
a=gensub(/.* ([0-9]+)$/,"\\1","g",$1);
b=gensub(/([0-9]+)\"/,"\\1","g",$2);
gsub(/[0-9]+$/,a+11,$1);
gsub(/^[0-9]+/,b+11,$2);
print $1,$2
}'
(bookmarks
("Chapter 1 Introduction 12" "#12"
("1.1 Problem Statement and Basic Definitions 34" "#13")
("Exercises 42" "#41")
("Notes and References 53" "#45"))
)
EDIT2 based on @Tim 's comment
(1) Does FS=OFS="\" \"#" mean the separator of field in both input and
output is double quote, space, double quote and #? Why specify double
quote twice?
You are right for the separator in both input and output part. It defined separator as:
" "#
There are two double quotes, because it is easier to catch the two numbers you want (based on your example input).
(2) In /.* ([0-9]+)$/, does $ mean the end of the string?
Exactly!
(3) In the third argument of gensub(), what is the difference between
"g" and "G"? there is no difference between G and g. Check this out:
gensub(regexp, replacement, how [, target]) #
Search the target string target for matches of the regular expression regexp.
If "how" is a string beginning with ‘g’ or ‘G’ (short for “global”), then
replace all matches of regexp with replacement.
This is from http://www.gnu.org/s/gawk/manual/html_node/String-Functions.html. you can read to get detailed usage of gensub.
Portable solution using sed
:
sed '
:1
/[aA][bB][cC][dD][eE][fF]/!b
s//\
&\
pqrstu\
PQRSTU\
/;:2
s/\n[[:lower:]]\(.*\n\)\(.\)\(.*\n\).\(.*\n\)/\2\
\1\3\4/;s/\n[^[:lower:]]\(.*\n\).\(.*\n\)\(.\)\(.*\n\)/\3\
\1\2\4/;t2
s/\n.*\n//;b1'
It's a bit easier with GNU sed:
search=abcdef replace=pqrstuvwx
sed -r ":1;/$search/I!b;s//\n&&&\n$replace\n/;:2
s/\n[[:lower:]](.*\n)(.)(.*\n)/\l\2\n\1\3/
s/\n[^[:lower:]](.*\n)(.)(.*\n)/\u\2\n\1\3/;t2
s/\n.*\n(.*)\n/\1/g;b1"
By using &&&
above, we reuse the case pattern of the string for the rest of the replacement, So ABcdef
would be changed to PQrstuVWx
and AbCdEf
to PqRsTuVwX
.
Change it to &
to affect only the case of the first 6 characters.
(note that it may not do what you want or may run into an infinite loop if the replacement may be subject to substitution (for instance if substituting foo
for foo
, or bcd
for abcd
)
Best Answer
Here's one way with
sed
:How it works:
The 1st
sed
turnsdictionary.txt
into a script-file (editing commands, one per line). This is piped to the 2ndsed
(note the-f -
which means read commands fromstdin
) that executes those commands, editingnovel.txt
.This requires translating your format
into a
sed
command and escaping any special characters in the process for bothLHS
andRHS
:So the first substitution
turns
"STRING" : "REPLACEMENT"
intoSTRING\nREPLACEMENT
(\n
is a newline char). The result is then copied over theh
old space.s|.*\n||
deletes the first part keeping onlyREPLACEMENT
thens|[\&/]|\\&|g
escapes the reserved characters (this is theRHS
).It then e
x
changes the hold buffer with the pattern space ands|\n.*||
deletes the second part keeping onlySTRING
ands|[[\.*^$/]|\\&|g
does the escaping (this is theLHS
).The content of the hold buffer is then appended to pattern space via
G
so now the pattern space content isESCAPED_STRING\nESCAPED_REPLACEMENT
.The final substitution
transforms it into
s/ESCAPED_STRING/ESCAPED_REPLACEMENT/g