Print line only if the upper line include specific word

awkperlsedtext processing

We have the following file with hostnames and host ip's ( long file with 90-100 machines per linux machine )

hosts.cluster.conf

  "href" : "http://localhost:8080/api/v1/hosts/worker02.sys87.com",
  "Hosts" : 
    "cluster_name" : "hdp",
    "host_name" : "worker02.sys87.com",
    "ip" : "23.67.32.65"


  "href" : "http://localhost:8080/api/v1/hosts/worker03.sys87.com",
  "Hosts" : 
    "cluster_name" : "hdp",
    "host_name" : "worker03.sys87.com",
    "ip" : "23.67.32.66"


  "href" : "http://localhost:8080/api/v1/hosts/worker04.sys87.com",
  "Hosts" : 
    "host_name" : "worker04.sys87.com",
    "ip" : "23.67.32.67"


  "href" : "http://localhost:8080/api/v1/hosts/worker05.sys87.com",
  "Hosts" : 
    "cluster_name" : "hdp",
    "host_name" : "worker05.sys87.com",
    "ip" : "23.67.32.68"

we want to print all host_name lines only if the upper line before include the "cluster_name" word

expected results

"host_name" : "worker02.sys87.com",

"host_name" : "worker03.sys87.com",

"host_name" : "worker05.sys87.com",

Best Answer

Short awk solution:

awk '/cluster_name/{ cl=NR }/host_name/ && NR-1==cl' hosts.cluster.conf

/cluster_name/{ cl=NR } - capturing the record number of "cluster_name" line
/host_name/ - on encountering "host_name" line
NR-1==cl - ensuring that the current "host_name" record number NR is next after "cluster_name" record number (presented by cl)

The output:

"host_name" : "worker02.sys87.com",
"host_name" : "worker03.sys87.com",
"host_name" : "worker05.sys87.com",

In case if host_name appears as the 1st line, though I doubt about that in real case, use the following version:

awk '/cluster_name/{ cl=NR }/host_name/ && cl && NR-1==cl' hosts.cluster.conf

Using grep

Why can't you just use the -r switch to grep to recurse the filesystem instead of making use of find? There are 2 additional switches I'd use too, instead of the -n switch.

$ grep -rHn PATTERN <DIR> | cut -d":" -f1-2

Example #1

$ grep -rHn PATH ~/.bashrc | cut -d":" -f1-2
/home/saml/.bashrc:25

Details

-r - recursively search through files + directories
-H - prints the name of the file if it matches (less restrictive than -l) i.e. it works with grep's other switches
-n - display the line number of the match

Example #2

$ grep -rHn PATH ~/.bash* | cut -d":" -f1-2
/home/saml/.bash_profile:10
/home/saml/.bash_profile:12
/home/saml/.bash_profile_askapache:99
/home/saml/.bash_profile_askapache:101
/home/saml/.bash_profile_askapache:118
/home/saml/.bash_profile_askapache:166
/home/saml/.bash_profile_askapache:218
/home/saml/.bash_profile_askapache:250
/home/saml/.bash_profile_askapache:314
/home/saml/.bash_profile_askapache:2317
/home/saml/.bash_profile_askapache:2323
/home/saml/.bashrc:25

Using find

$ find . -exec sh -c 'grep -Hn PATTERN "$@" | cut -d":" -f1-2' {}  +

Example

$ find ~/.bash* -exec sh -c 'grep -Hn PATH "$@" | cut -d":" -f1-2' {}  +
/home/saml/.bash_profile:10
/home/saml/.bash_profile:12
/home/saml/.bash_profile_askapache:99
/home/saml/.bash_profile_askapache:101
/home/saml/.bash_profile_askapache:118
/home/saml/.bash_profile_askapache:166
/home/saml/.bash_profile_askapache:218
/home/saml/.bash_profile_askapache:250
/home/saml/.bash_profile_askapache:314
/home/saml/.bash_profile_askapache:2317
/home/saml/.bash_profile_askapache:2323
/home/saml/.bashrc:25

If you truly want to use find you can do something like this to exec grep upon finding the files using find.

Print only unique lines from file not the duplicates

That's the job for uniq:

$ LC_ALL=C uniq -u file
grapes
lime
peach

If you want other tools, like perl:

perl -nle '$h{$_}++; END {print for grep { $h{$_} == 1 } %h}' <file