I want to remove the unwanted data.
So my question is how do I delete lines above test1 if it does not contain test1 OR not ended with a quote?
20 /test1/catergory="Food"
20 /test1/target="Adults, \"Goblins\", Elderly,
Babies, \"Witch\",
Faries"
**This is some unwanted data to remove**
20 /test1/type="Western"
20 /test1/end=category
**This is some unwanted data to remove**
20 /test1/Purpose=
20 /test1/my_purpose="To create
a fun-filled moment"
20 /test1/end=Purpose
Expected output:
20 /test1/catergory="Food"
20 /test1/target="Adults, \"Goblins\", Elderly,
Babies, \"Witch\",
Faries"
20 /test1/type="Western"
20 /test1/end=category
20 /test1/Purpose=
20 /test1/my_purpose="To create
a fun-filled moment"
20 /test1/end=Purpose
I was stuck with these few commands :
1. grep -B1 'test1' test_long_sentence.txt
2. sed '/test1/!d' test_long_sentence.txt
3. sed '/\"$/!d' test_long_sentence.txt
I do not know how to combine no. 2 and 3 (sed with multiple commands with regex and OR condition)
Best Answer
lex
(orflex
on Linux systems) is a program that takes a scanner/lexer specification and turns it into a C program. Its scanner specification is similar in nature to anawk
program, but whereawk
is record orientedlex
is "character oriented".Using
lex
with the following source inlexer.l
:This scanner uses an
OUTPUT
state to keep track of whether we want the current characters outputted or not. We enter this state withBEGIN OUTPUT
when we find a line that looks like(this is handled by the first rule). We exit this state when a line ends and we're not currently scanning a quoted string (this is handled by the second rule).
A quoted string is started and ended with an un-escaped
"
character (the third rule). All other characters are passed through as is without action (the fourth rule).While not in the
OUTPUT
state, we ignore everything (the last rule).Note that this is a makeshift scanner written for your particular data. It does not handle quoted strings that ends with an escaped backslash (
"some data \\"
), but it works on the data that you have shown.Building it:
(on Linux, when using
flex
, you may have to usemake lexer LDLIBS=-ll
)Using it: