How to use awk to extract data from a file based on the content of another file

awktext processing

I have two files. One file includes structured data and be low is a sample.

article 1 title
article 1 body line 1
article 1 body line 2
+++
article 2 title
article 2 body line 1
article 2 body line 2
article 2 body line 3
+++
article 3 title
article 3 body line 1
article 3 body line 2
+++
article 4 title
article 4 body line 1
article 4 body line 2
article 4 body line 3

As you can see, +++ is the separator for records. For each record, the first line is the title, all other lines are the content of this record. Another file is a simple text file with a list of titles. For example:

article 1 title
article 3 title
article 4 title

What I want is the records with their title listed in the second file. So for the aforementioned example, the expected result is:

article 1 title
article 1 body line 1
article 1 body line 2
+++
article 3 title
article 3 body line 1
article 3 body line 2
+++
article 4 title
article 4 body line 1
article 4 body line 2
article 4 body line 3

I think awk could probably solve my problem but I don't know how.

What I've tried is this:

awk 'BEGIN{RS="(\r?\n)?\+{3}(\r?\n)?"; FS="\r?\n"; ORS="+++"} NR==FNR{a[$0];next} ...' title_list.txt data.txt

My problem is that the RS for the two files should be different and I don't know how to make it work.

Best Answer

You can set variables like RS separately for each file. For example:

$ awk 'NR==FNR{a[$0];next} $1 in a' RS='\r?\n' title_list.txt RS='+++\r?\n' FS='\r?\n' ORS='+++\n' data.txt
article 1 title
article 1 body line 1
article 1 body line 2
+++
article 3 title
article 3 body line 1
article 3 body line 2
+++
article 4 title
article 4 body line 1
article 4 body line 2
article 4 body line 3
+++
Related Question