I have a gene list file. Some thing like this
SWT21
SSA1
NRP1
EFB1
TFC3
MDM10
I have another file which also contains the names of these genes in my list along with other essential information about them. The second file looks like this:
chrI 147593 151166 YAL001C - TFC3
chrI 143706 147531 YAL002W + VPS8
chrI 142173 143160 YAL003W + EFB1
chrI 140759 141407 YAL004W + YAL004W
chrI 139502 141431 YAL005C - SSA1
chrI 137697 138345 YAL007C - ERP2
chrI 136913 137510 YAL008W + FUN14
chrI 135853 136633 YAL009W + SPO7
chrI 134183 135665 YAL010C - MDM10
I want to extract those lines in the 2nd file which have gene names as are present in first file.
Best Answer
All you need is a simple
grep
:The options used are:
ERK1
will not match the geneERK12
(-w
is not a standard option but is fairly common)gene_list.txt
.TOR*
(if such a thing existed) would not matchTORRRRRR
.NOTE: This assumes that there are no spaces around the gene names in your list. If there are, you will need to remove them first (here with GNU
sed
):