Compare two files based on the values in their last column

awktext processing

I have two files, each of them have several columns and not equal in line number , I need to read the third column in both files and make sure are identical then compare the forth column in both files and get the highest number and print whole line in a third file. and the for the line which not found in the other file take it also as highest number

File A

a b c 10
d e f 11
g h i 15
j k l 15
p l m 35

Flie B

d e f 15
j k l 20
w x z 40

Required Output

File C

a b c 10
d e f 15
g h i 15
j k l 20
p l m 35
w x z 40

Best Answer

$ cat fileA fileB | sort -k3,3 -k4,4nr | sort -k3,3 -u
a b c 10
d e f 15
g h i 15
j k l 20
p l m 35
w x z 40

This is a pipeline with three parts:

Concatenate fileA with fileB.
Sort the concatenated file in decreasing numerical order based on the fourth column for each unique value in the third column. The result of this step is
```
a b c 10
d e f 15
d e f 11
g h i 15
j k l 20
j k l 15
p l m 35
w x z 40
```
Sort this again, but remove duplicates and only use the third column as the sort key. Since this will leave the line with the first found instance of the sort key but discard the lines with duplicated sort keys (and with lower values in the fourth column, thanks to the first sort), it will give us the wanted result.

This approach disregards the contents of the first two columns completely.

Related Solutions

Lum – Compare two files: lines present in one, not in the other, by one column comparison

join requires that the files be presorted, as they are in the esample's args to join), so if you need to manintain the sequence ot the output, it would need different approach. Note, it doesn't try to keep the width of the original field spacing.

join -1 2 -2 2 -v 1 <(sort file1) <(sort file2)

output

21 12342 2
21 12349 7

Best Answer

Related Solutions

Lum – Compare two files: lines present in one, not in the other, by one column comparison

Related Question