I have a file that contains 4 columns. I want to compare the last three columns and count how many times they occur without deleting any of the lines. I just want the count to be present in front of each line.
My file looks like this:
ID-jacob 4.0 6.0 42.0
ID-elsa 5.0 8.0 45.0
ID-fred 4.0 6.0 42.0
ID-gerard 6.0 8.0 20.0
ID-trudy 5.0 8.0 45.0
ID-tessa 4.0 6.0 42.0
My desired outcome is:
3 ID-jacob 4.0 6.0 42.0
2 ID-elsa 5.0 8.0 45.0
3 ID-fred 4.0 6.0 42.0
1 ID-gerard 6.0 8.0 20.0
2 ID-trudy 5.0 8.0 45.0
3 ID-tessa 4.0 6.0 42.0
I tried to use sort and uniq, but this only gives me the first line per duplicate lines:
cat file | sort -k2,4 | uniq -c -f1 > outputfile
Best Answer
You could run into trouble storing large files in memory, this is slightly better as it only stores matching lines, after sort has done the heavy lifting of putting the lines in order.
It is customary to save
awk
scripts in a file.You could use this along the lines of
sort -k2,4 file | awk -f script