Linux – Convert grep Output from Long to Wide Format

awkgreplinuxtext processing

I have a file of patterns and I want to return all the line numbers where the pattern was found, but in a wide format and not long/spread.
Example:

fileA.txt

Germany
USA
UK

fileB.txt

USA
USA
Italy
Germany
UK
UK
Canada
Canada
Germany
Australia
USA

I have done something like this:

grep -nf fileA.txt fileB.txt

which returned me:

1:USA
2:USA
4:Germany
5:UK
6:UK
9:Germany
11:USA

However, I want to have something like:

Germany 4 9
USA 1 2 11
UK 5 6

Best Answer

Using GNU datamash:

$ grep -n -x -F -f fileA.txt fileB.txt | datamash -s -t : -g 2 collapse 1
Germany:4,9
UK:5,6
USA:1,2,11

This first uses grep to get the lines from fileB.txt that exactly matches the lines in fileA.txt, and outputs the matching line numbers along with the lines themselves.

I'm using -x and -F in addition to the options that are used in the question. I do this to avoid reading the patterns from fileA.txt as regular expressions (-F), and to match complete lines, not substrings (-x).

The datamash utility is then parsing this as lines of :-delimited fields (-t :), sorting it (-s) on the second field (-g 2; the countries) and collapsing the first field (collapse 1; the line numbers) into a list for each country.

You could then obviously replace the colons and commas with tabs using tr ':,' '\t\t', or with spaces in a similar way.

$ grep -n -x -f fileA.txt -F fileB.txt | datamash -s -t : -g 2 collapse 1 | tr ':,' '\t\t'
Germany 4       9
UK      5       6
USA     1       2       11

Related Solutions

Grep multiple pattern negative match

If there is an empty line in the patterns file it will match every line, causing no lines to be returned with -v. This is because the lines are interpreted as regular expressions, and an empty regular expression will always match.

This isn't a problem with -F however, because grep ignores empty lines with -F.
-F causes grep to interpret the lines as simple strings to search for and may speed up grep if regular expressions aren't needed.

How to print the inputted pattern which don’t have matching lines

Here's an sh script that produces the results you need.

#!/bin/sh

grep -f /path/to/patterns.txt /path/to/*_856_2017* | sort -u > /path/to/foundFiles.txt 

while read -r LINE
do
    grep -F "$LINE" /path/to/foundFiles.txt
    if [ $? -eq 1 ]
    then
        echo "$LINE" not found
    fi
done < /path/to/patterns.txt

In this script, I assume you output the results of your grep to the file found.txt, and that you store your patterns in the file /path/to/foundFiles.txt.

As you can see, the grep in the loop will produce the same contents of the file found.txt while adding "$pattern" not found for the missing ones.

I also devised a second approach to your case:

#!/bin/sh

grep -f /path/to/patterns.txt /path/to/*_856_2017* |
    sort -u > /path/to/foundFiles.txt

comm -23 /path/to/patterns.txt /path/to/foundFiles.txt |
    xargs -L 1 -I {} echo {} not found > /path/to/notFoundFiles.txt

cat /path/to/foundFiles.txt /path/to/notFoundFiles.txt > /path/to/finalList.txt

In this case, patterns.txt needs to be already sorted for comm to work.

The comm command compares the two files returning the lines present only in patterns.txt (-23 parameter), which is the list of patterns not found by grep.

Then, xargs grabs every line (-L 1) and echoes the line ({}) with " not found" appended to it. The result of xargs is redirected to the notFoundFiles.txt file.

Finally, you simply concatenate foundFiles.txt and notFoundFiles.txt into finalList.txt.

Best Answer

Related Solutions

Grep multiple pattern negative match

How to print the inputted pattern which don’t have matching lines

Related Question