Text Processing – Extract Multiple Patterns from Line Regardless of Order

awkgrepsed

I'm new to Unix scripting, so please bear with me.

I am given a file which has information on processes on each line. I need to extract certain information on these processes from each line.

Example of the file –

process1 port=1234 appID=dummyAppId1 authenticate=true <some more params>
process3 port=1244 authenticate=false appID=dummyAppId2 <some more params>
process2 appID=dummyAppId3 port=1235 authenticate=true <some more params>

The desired output is –

1
port=1234 authenticate=true appID=dummyAppId1 
2
port=1244 authenticate=false appID=dummyAppId2
3
port=1235 authenticate=true appID=dummyAppId3

The numbers 1, 2, and 3 on each line just denote the line number of the output file.

I have already tried using the sed s/ command but it is order-specific, while the parameters in the input file don't follow an order – as a result, some lines in the input file are skipped.

Here is my command –

sed -nr 'appId/s/(\w+).*port=([^ ]+) .*authenticate=[^ ]+) .*appId=[^ ]+) .*/\2\t\3\t\4/p' | sed =

Could anyone guide me on how to extract those parameters regardless of order?

Thanks!

Edit 1: I managed to use grep's look-behind zero-width assertion feature this way –

grep -Po '(?<=pattern1=)[^ ,]+|(?<=pattern2=)[^ ,]+|(?<=pattern3=)[^ ,]+|(?<=pattern4=)[^ ,]+' filename

but this seems to give the output for each line in new lines i.e.

1234
true
dummyAppId1

Trying to figure out how to get it on one line using grep (i.e. not via merging X lines into 1)

Edit 2: mixed-up the order of parameters in the input

Edit 3: I'm sorry, I should have mentioned this earlier – perl seems to be restricted on the machines I work on. While the answers provided by Stephane and Sundeep work perfectly when I test it out locally, it wouldn't work on the machines I need it to finally run on.
It looks like awk, grep, and sed are the mainly supported options 🙁

Best Answer

With awk (tested with GNU awk, not sure if it works with other implementations)

$ cat kv.awk
/appID/ {
    for (i = 1; i <= NF; i++) {
        $i ~ /^port=/ && (a = $i)
        $i ~ /^authenticate=/ && (b = $i)
        $i ~ /^appID=/ && (c = $i)
    }
    print NR "\n" a, b, c
}

$ awk -v OFS='\t' -f kv.awk ip.txt
1
port=1234   authenticate=true   appID=dummyAppId1
2
port=1244   authenticate=false  appID=dummyAppId2
3
port=1235   authenticate=true   appID=dummyAppId3

With perl

$ # note that the order is changed for second line here
$ cat ip.txt
process1 port=1234 authenticate=true appID=dummyAppId1 <some more params>
process3 port=1244 appID=dummyAppId2 authenticate=false <some more params>
process2 port=1235 authenticate=true appID=dummyAppId3 <some more params>

$ perl -lpe 's/(?=.*(port=[^ ]+))(?=.*(authenticate=[^ ]+))(?=.*(appID=[^ ]+)).*/$1\t$2\t$3/; print $.' ip.txt 
1
port=1234   authenticate=true   appID=dummyAppId1
2
port=1244   authenticate=false  appID=dummyAppId2
3
port=1235   authenticate=true   appID=dummyAppId3

(?=.*(port=[^ ]+)) first capture group for port
(?=.*(authenticate=[^ ]+)) second capture group for authenticate and so on
print $. for line number
To avoid partial matches, use \bport, \bappID etc if word boundary is enough. Otherwise, use (?<!\S)(port=[^ ]+) to restrict based on whitespace.

If you need to print only lines containing appID or any other such condition, change -lpe to -lne and change print $. to print "$.\n$_" if /appID/

Related Solutions

Bash – Aggregate and group text file in perl or bash

In Perl

perl -F';' -lane 'push @{$h{join ";",@F[0..2]}},$F[3];
                  END{
                    for(sort keys %h){
                        print "$_: ". join ",",@{$h{$_}};
                    }
                  }' your_file

You should be able to do something similar in awk using associative arrays, but I'm not really that well-versed in awk so I can't contribute actual code.

Explanation

Here's an expanded version of the above code that uses as little "magic" as possible:

open($FH,"<","your_file");
while($line=<$FH>){ # For each line in the file (accomplished by -n)
    chomp $line; # Remove the newline at the end (done by -l)
    # The ; is set by -F and storing the split in @F done by -a
    @F = split /;/,$line # Split the line into fields on ;
    $app_id = join ";",@F[0..2]; # AppID is the first 3 fields
    push @{$h{$app_id}},$F[3]; # The 4th field is added onto the hash
} # The whole file has been read at this point.
foreach $key (sort keys %h){ # Sort the hash by AppID
     print "$key: " . join ",",@{h{$key}}."\n"; # Print the array values
     # The newline ("\n") added at the end is also done by -l
}

Now there is only the push statement left to explain in more detail:

push is usually used to add elements to an array variable. For example:
```
push @a,$x
```
appends the contents of the variable $x to the array @a.
The loop that reads the file line-by-line is filling in a hash table (%h). The keys to the hash are the AppIDs and the value that corresponds to each key is an array containing all the user IDs associated with that AppID. This is an anonymous array (it has no name); in Perl this is implemented as an array reference (somewhat similar to C pointers). And since the value of %h that corresponds to the AppID $app_id is denoted by $h{$app_id}, tacking on the Perl array sigial (@) treats the hash value as an array (de-references the array reference) and pushes the current user ID onto it.
An alternative that may feel less "Perlish" to you would be to concatenate the 4th field to the current value:
```
while(...) { ... $h{$app_id} = $h{$app_id} . ",$F[3]" }
foreach $key (sort keys %h) { print "$_: $h{$_}" }
```
where the . in Perl is the string concatenation operator.

Note that in the explanation code, I have omitted the perl -e '...' wrapper so the syntax highlighting can get to the code and make it more readable.

Grep – Extract Lines Starting with a Sequence and Output to Another File

Try this with GNU sed:

sed -n '/^BIHAR/p' file > new_file

or with grep:

grep '^BIHAR' file > new_file

or with awk:

awk '/^BIHAR/' file > new_file

Best Answer

Related Solutions

Bash – Aggregate and group text file in perl or bash

Grep – Extract Lines Starting with a Sequence and Output to Another File

Related Question