I have two files, let's say
File1:
Locus_1
Locus_2
Locus_3
File2:
3 3 Locus_1 Locus_40 etc_849
3 2 Locus_2 Locus_94 *
2 2 Locus_6 Locus_1 *
2 3 Locus_3,Locus_4 Locus_50 *
3 3 Locus_9 Locus_3 etc_667
I want to do a grep -F
for the first file only on the third column of the second file (in the original File2
fields are separated by tabs), such as the output should be:
Output:
3 3 Locus_1 Locus_40 etc_849
3 2 Locus_2 Locus_94 *
2 3 Locus_3,Locus_4 Locus_50 *
How can I do it?
Edit
To Chaos: no, the comma is not a mistake. I can have more than one Locus_* in a column – and in case the second Locus_* (the one after the comma) matches one of the lines of File1
I want it to be retrieved, too!
Best Answer
If
grep
is not necessary, one simple solution would be to usejoin
for that:Explanation:
join -1 1 -2 3
: join the two files where in the first file the first (and only) field is used and in the second file the third field. They are printed when they are equal.<(sort file1)
:join
needs sorted input<(sort -k3 file2)
: the input must be sorted on the join field (3rd field here)