Ubuntu – Capture and compile a list of names from a log file

command linetext processing

I need a one-line command to compile and print all of the Expendable Launch Vehicle names listed in a log file.

The ELV names are all listed in capital letters under the /elv directory.

The output should appear in the format of one name per line, with no duplicates:

ALICE
BOB
CHARLIE

I tried

grep "GET" NASA_access_log_Aug95.txt | grep "ELV" | wc -l

but it only showed me the number of ELV not printed ELV names

Below is a sample of my log file NASA_access_log_Aug95.txt:

cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:31 -0400] "GET /elv/TITAN/mars1s.jpg HTTP/1.0" 200 1156
www-a2.proxy.aol.com - - [03/Aug/1995:20:43:31 -0400] "GET /elv/DELTA/dsolids.jpg HTTP/1.0" 200 24558
cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:32 -0400] "GET /elv/TITAN/mars3s.jpg HTTP/1.0" 200 1744
castor.gel.usherb.ca - - [03/Aug/1995:20:43:33 -0400] "GET /shuttle/missions/51-l/movies/ HTTP/1.0" 200 372
cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:33 -0400] "GET /elv/ATLAS_CENTAUR/atc69s.jpg HTTP/1.0" 200 1659
cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:35 -0400] "GET /elv/TITAN/mars2s.jpg HTTP/1.0" 200 1549
palona1.cns.hp.com - - [03/Aug/1995:20:43:36 -0400] "GET /shuttle/missions/sts-69/count69.gif HTTP/1.0" 200 46053
www-c1.proxy.aol.com - - [03/Aug/1995:20:43:38 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0882.gif HTTP/1.0" 200 51289
cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:40 -0400] "GET /elv/ATLAS_CENTAUR/acsuns.jpg HTTP/1.0" 200 2263
cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:41 -0400] "GET /elv/ATLAS_CENTAUR/goess.jpg HTTP/1.0" 200 1306
cc-rd6-mg1-dip4-9.massey.ac.nz - - [03/Aug/1995:20:43:45 -0400] "GET /elv/DELTA/dsolidss.jpg HTTP/1.0" 200 1629 

Best Answer

I would just use grep and sort -u:

$ grep -Po '/elv/\K[^/]+' NASA_access_log_Aug95.txt | sort -u
ATLAS_CENTAUR
DELTA
TITAN

The -P enables Perl Compatible Regular Expressions which let us use \K which means "ignore anything matched up to this point". The -o means "only show the matching portion of the line". Then, the regular expression means "look for /elv/, ignore everything matched until the /elv/, and then look for one or more non-/ characters ([^/]+).

Related Question