I want to find patterns that are listed in one file and find them in other file. The second file has those patterns separated by commas.
for e.g. first file F1 has genes
ENSG00000187546
ENSG00000113492
ENSG00000166971
and second file F2 has those genes along with some more columns(five columns) which I need
region gene chromosome start end
intronic ENSG00000135870 1 173921301 173921301
intergenic ENSG00000166971(dist=56181),ENSG00000103494(dist=37091) 16 53594504 53594504
ncRNA_intronic ENSG00000215231 5 5039185 5039185
intronic ENSG00000157890 15 66353740 66353740
So the gene ENSG00000166971 which is present in the second file does not show up in grep because it has another gene with it,separated by comma.
My code is:
grep -f "F1.txt" "F2.txt" >output.txt
I want those values even if one of them is present,and the associated data with it.Is there any way to do this?
Best Answer
What version of
grep
are you using? I tried your code and got the following results:If you just want the results that match you can use
grep
's-o
switch to report only the things that match:grep version
Stray characters in F1.txt?
While debugging this further I noticed several stray spaces at the end of the 2nd line in the file
F1.txt
. You can see them usinghexdump
.They show up with as ASCII codes 20. You can see them in them here:
32 20 20 0a
.