I am writing a bash script (just learning bash) to extract some lines from a file based on two patterns. The first pattern is just a sentence ending in a colon. The second pattern is a *
repeated N (in this case 58) times.
An example file:
lines I don not want
lines I don not want
lines I don not want
A sentence here:
********************************************************
lines I want
lines I want
lines I want
**********************************************************
lines I don not want
lines I don not want
lines I don not want
Desired output:
A sentence here:
********************************************************
lines I want
lines I want
lines I want
**********************************************************
I can get the script to work if I explicitly type out A sentence here
and \*
58 times within the call to awk, but cleanliness and readability I would prefer to do something like below:
pat1="A sentence here"
pat2=`printf -- '\*%.s' {1..58} ; echo`
pat2=${pat2//\\/\\\\}
awk -v pat1="${pat1}" -v pat2="${pat2}" '/{pat1}/ {p=1}; p; /{pat2}/ {p=0}' $1
Where the first positional variable is the input file. The above code returns nothing. I initially tried it without the substitution on pat2
, but got the warning:
awk: warning: escape sequence `\*' treated as plain `*'
I will have to run this command thousands of times and would ideally like a solution that is both clean and efficient. I'm not tied to using awk
at all.
Edit:
I just noticed that even when I manually type the patterns into awk, I still receive the warning message. I am likely not passing the variables to awk correctly.
Best Answer
Several options here:
pat1, pat2 treated as regexps:
Note that
mawk
and versions ofgawk
prior to 4.0.0 do not support the{}
extended regular expression operator. For old versions ofgawk
, you can pass thePOSIXLY_CORRECT
environment variable to make it recognise it.Here using the
start-condition, end-condition [{action}]
approach, but you could do the same with yourp
flag approach.pat1, pat2 treated as fixed strings:
Here,
index()
searches for the needle (the variable content) anywhere in the haystack (the current record (line)), but you could also do a simple full-line comparison:(the
""
is to force a string comparison even in cases where both$0
andENVIRON["patx"]
are numerical).Avoid using
-v
to pass data that may contain backslash characters asawk
does some C escape sequence (\n
,\b
,\\
...) processing on them so you'd need to escape the backslashes (and with GNUawk
4.2 or above, values that start with@/
and end in/
are also a problem). Same goes for variables passed likeawk '...code...' awkvar="$shellvar"
. UseENVIRON
orARGV
instead.See this answer to a related question for further details.