Text Processing – Extract Multiple Patterns from Line Regardless of Order

awkgrepsed

I'm new to Unix scripting, so please bear with me.

I am given a file which has information on processes on each line. I need to extract certain information on these processes from each line.

Example of the file –

process1 port=1234 appID=dummyAppId1 authenticate=true <some more params>
process3 port=1244 authenticate=false appID=dummyAppId2 <some more params>
process2 appID=dummyAppId3 port=1235 authenticate=true <some more params>

The desired output is –

1
port=1234 authenticate=true appID=dummyAppId1 
2
port=1244 authenticate=false appID=dummyAppId2
3
port=1235 authenticate=true appID=dummyAppId3

The numbers 1, 2, and 3 on each line just denote the line number of the output file.

I have already tried using the sed s/ command but it is order-specific, while the parameters in the input file don't follow an order – as a result, some lines in the input file are skipped.

Here is my command –

sed -nr 'appId/s/(\w+).*port=([^ ]+) .*authenticate=[^ ]+) .*appId=[^ ]+) .*/\2\t\3\t\4/p' | sed =

Could anyone guide me on how to extract those parameters regardless of order?

Thanks!

Edit 1: I managed to use grep's look-behind zero-width assertion feature this way –

grep -Po '(?<=pattern1=)[^ ,]+|(?<=pattern2=)[^ ,]+|(?<=pattern3=)[^ ,]+|(?<=pattern4=)[^ ,]+' filename

but this seems to give the output for each line in new lines i.e.

1234
true
dummyAppId1

Trying to figure out how to get it on one line using grep (i.e. not via merging X lines into 1)

Edit 2: mixed-up the order of parameters in the input

Edit 3: I'm sorry, I should have mentioned this earlier – perl seems to be restricted on the machines I work on. While the answers provided by Stephane and Sundeep work perfectly when I test it out locally, it wouldn't work on the machines I need it to finally run on.
It looks like awk, grep, and sed are the mainly supported options 🙁

Best Answer

With awk (tested with GNU awk, not sure if it works with other implementations)

$ cat kv.awk
/appID/ {
    for (i = 1; i <= NF; i++) {
        $i ~ /^port=/ && (a = $i)
        $i ~ /^authenticate=/ && (b = $i)
        $i ~ /^appID=/ && (c = $i)
    }
    print NR "\n" a, b, c
}

$ awk -v OFS='\t' -f kv.awk ip.txt
1
port=1234   authenticate=true   appID=dummyAppId1
2
port=1244   authenticate=false  appID=dummyAppId2
3
port=1235   authenticate=true   appID=dummyAppId3


With perl

$ # note that the order is changed for second line here
$ cat ip.txt
process1 port=1234 authenticate=true appID=dummyAppId1 <some more params>
process3 port=1244 appID=dummyAppId2 authenticate=false <some more params>
process2 port=1235 authenticate=true appID=dummyAppId3 <some more params>

$ perl -lpe 's/(?=.*(port=[^ ]+))(?=.*(authenticate=[^ ]+))(?=.*(appID=[^ ]+)).*/$1\t$2\t$3/; print $.' ip.txt 
1
port=1234   authenticate=true   appID=dummyAppId1
2
port=1244   authenticate=false  appID=dummyAppId2
3
port=1235   authenticate=true   appID=dummyAppId3
  • (?=.*(port=[^ ]+)) first capture group for port
  • (?=.*(authenticate=[^ ]+)) second capture group for authenticate and so on
  • print $. for line number
  • To avoid partial matches, use \bport, \bappID etc if word boundary is enough. Otherwise, use (?<!\S)(port=[^ ]+) to restrict based on whitespace.

If you need to print only lines containing appID or any other such condition, change -lpe to -lne and change print $. to print "$.\n$_" if /appID/

Related Question