I'm trying to find all patterns between a pair of double quotes. Let say I have a file with contents look like as following:
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".
I want to below words as output:
One
Two
Three
Four
As you can see all strings in output are between a pair of quotes.
What I tried, is this command:
grep -Po ' "\K[^"]*' file
Above command works fine if I have a space before first pair of "
marks. For example it works if my input file contains the following:
first matched is "One". the second is here "Two "
and here are in second line " Three " "Four".
I know I can do this with multiple commands combination. But I'm looking for one command and without using that for multiple time. e.g: below command
grep -oP '"[^"]*"' file | grep -oP '[^"]*'
How can I achieve/print all of my patterns just using one command?
Reply to comments: It's not important for me to removing whitespace around matched pattern inside a pair of quotes, but it would be better if the command support it too. and also my files contain nested quotes like "foo "bar" zoo"
. And all of the quoted words are in separate lines and they are not expanded to multi lines.
Thanks in advance.
Best Answer
First of all, your
grep -Po '"\K[^"]*' file
idea fails becausegrep
sees both"One"
and". the second is here"
as being inside quotes. Personally, I'd probably just doBut that is two commands. To do it with a single command, you could use one of:
Perl
Here, the
@F
array holds all matches of the regex (a quote, followed by as many non-"
as possible until the next"
). Theprint for @F
just means "print each element of@F
.Perl
To remove leading/trailing spaces from each match, use this:
Here, Perl is behaving like
awk
. The-a
switch causes it to automatically split input lines into fields on the character given by-F
. Since I have given it"
, the fields are:Because we are looking for text between two consecutive field separators, we know we want every second field. So,
for($i=1;$i<=$#F;$i+=2){print $F[$i]}
will print the ones we care about.The same idea but in
awk
: