Ubuntu – How to find all patterns between two characters

awkgrepregexsedtext processing

I'm trying to find all patterns between a pair of double quotes. Let say I have a file with contents look like as following:

first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".

I want to below words as output:

One
Two
Three
Four

As you can see all strings in output are between a pair of quotes.

What I tried, is this command:

grep -Po ' "\K[^"]*' file

Above command works fine if I have a space before first pair of " marks. For example it works if my input file contains the following:

first matched is "One". the second is here "Two "
and here are in second line " Three " "Four".

I know I can do this with multiple commands combination. But I'm looking for one command and without using that for multiple time. e.g: below command

grep -oP '"[^"]*"' file | grep -oP '[^"]*'

How can I achieve/print all of my patterns just using one command?

Reply to comments: It's not important for me to removing whitespace around matched pattern inside a pair of quotes, but it would be better if the command support it too. and also my files contain nested quotes like "foo "bar" zoo". And all of the quoted words are in separate lines and they are not expanded to multi lines.

Thanks in advance.

Best Answer

First of all, your grep -Po '"\K[^"]*' file idea fails because grep sees both "One" and ". the second is here" as being inside quotes. Personally, I'd probably just do

$ grep -oP '"[^"]+"' file | tr -d '"'
One
Two 
 Three 
Four

But that is two commands. To do it with a single command, you could use one of:

Perl
```
$ perl -lne '@F=/"\s*([^"]+)\s*"/g; print for @F' file 
One
Two 
Three 
Four
```
Here, the @F array holds all matches of the regex (a quote, followed by as many non-" as possible until the next "). The print for @F just means "print each element of @F.

Perl

$ perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){print $F[$i]}' file 
One
Two 
 Three 
Four

To remove leading/trailing spaces from each match, use this:

perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){$F[$i]=~s/^\s*|\s$//; print $F[$i]}' file

Here, Perl is behaving like awk. The -a switch causes it to automatically split input lines into fields on the character given by -F. Since I have given it ", the fields are:

$ perl -F'"' -lne 'for($i=0;$i<=$#F;$i++){print "Field $i: $F[$i]"}' file 
Field 0: first matched is 
Field 1: One
Field 2: . the second is here
Field 3: Two 
Field 0: and here are in second line
Field 1:  Three 
Field 2: 
Field 3: Four
Field 4: .

Because we are looking for text between two consecutive field separators, we know we want every second field. So, for($i=1;$i<=$#F;$i+=2){print $F[$i]} will print the ones we care about.

The same idea but in awk:

$ awk -F'"' '{for(i=2;i<=NF;i+=2){print $(i)}}' file 
One
Two 
 Three 
Four

Explanation

-F, : set the field separator to ,. Now, the first comma-separated field of each line will be $1, the second $2 and so on.
NR==FNR : these are two awk special variables. NR is the current input line and FNR is the line number of the current file. The two will be equal only while the 1st file is being read.
NR==FNR{a[$1$2$3$4]++; next} : while reading the 1st file, save the 1st 4 fields as a key in the array a and set their value to 1. This basically saves all 1st 4 fields of csv1. The next ensures that we immediately skip to the next line and don't process the rest of the script.
!a[$1$2$3$4] : the default action of awk is to print the current line. So, if you use something that evaluates to true, awk understands that it should print this line. !a[ $1$2$3$4] is true when a[$1$2$3$4] is not define which will happen for lines in csv1 whose 1st 4 fields were not present in any lines of csv2. Therefore, this directive will cause all lines whose 1st 4 fields have never been seen (so their value in the a array is not 1) to be printed.

Ubuntu – How to replace text between two patterns on different lines

It should work for you:

sed -e '/Security-Start/{ N; s/Security-Start.*Security-End/REDACTED/ }'

/Security-Start/ search for "Security-Start"
If you found it: "N;" means append the next line.
and do the replacements/Security-Start.*Security-End/REDACTED/ at the final result.

For more than of two line use this one:

sed -n '1h; 1!H; ${ g; s/Security-Start.*Security-End/REDACTED/p }'

Read here

Best Answer

Related Solutions

Ubuntu – Diffrence between two files of different row numbers

Explanation

Ubuntu – How to replace text between two patterns on different lines

Related Question