Linux – Convert grep Output from Long to Wide Format

awkgreplinuxtext processing

I have a file of patterns and I want to return all the line numbers where the pattern was found, but in a wide format and not long/spread.
Example:

fileA.txt

Germany
USA
UK

fileB.txt

USA
USA
Italy
Germany
UK
UK
Canada
Canada
Germany
Australia
USA

I have done something like this:

grep -nf fileA.txt fileB.txt

which returned me:

1:USA
2:USA
4:Germany
5:UK
6:UK
9:Germany
11:USA

However, I want to have something like:

Germany 4 9
USA 1 2 11
UK 5 6

Best Answer

Using GNU datamash:

$ grep -n -x -F -f fileA.txt fileB.txt | datamash -s -t : -g 2 collapse 1
Germany:4,9
UK:5,6
USA:1,2,11

This first uses grep to get the lines from fileB.txt that exactly matches the lines in fileA.txt, and outputs the matching line numbers along with the lines themselves.

I'm using -x and -F in addition to the options that are used in the question. I do this to avoid reading the patterns from fileA.txt as regular expressions (-F), and to match complete lines, not substrings (-x).

The datamash utility is then parsing this as lines of :-delimited fields (-t :), sorting it (-s) on the second field (-g 2; the countries) and collapsing the first field (collapse 1; the line numbers) into a list for each country.

You could then obviously replace the colons and commas with tabs using tr ':,' '\t\t', or with spaces in a similar way.

$ grep -n -x -f fileA.txt -F fileB.txt | datamash -s -t : -g 2 collapse 1 | tr ':,' '\t\t'
Germany 4       9
UK      5       6
USA     1       2       11
Related Question