Compare two text files and find matching lines

awkgrepsearchstring

I have two files A and B. A looks like this (4 to 6 lines):

GAGA
CAGA
GGGT
TATT

file B is a really big file with thousands of lines, here is a short example:

AAATGTCAAGAGACAGAAATGTCAAGAGGGT
AAGGGGGTTTATAATCATAAATCAAAGAAAT
ATATACAGAAATGTCAAGAGACAGAAATGTC
TCAAGAGACAGAAATGTCAAGAGGGTCTATA
AAGAGGGTCTATAATCATAAATCAAAGAAAT
AAGAGGGTCTATAATCATAAATCAAAGAAAT
ATACAGAAATGTCAAAACAGAAATGTCAAGG
ATATACAGAATATACAGAAATGTCAAGTTAT
ACAGAATATACAGAAATGTCAAGTTATATAC
ATATACAGAAATGTCAAGAGACAGAAATGTC
TCAGAATATAGTATTCTATTATATACAGAAA
AATATAGTATTCTATTATATACAGAAATGTC
GAATATACAGAAATGTCAAGTTATATACAGA
TATACAGAATATAGTATTCTATTATATACAG
CAGAATATAGTATTCTATTATATACAGAATA
AGTTATATACAGAATATAGTATTCTATTATA
TACAGAATATAGTATTCTATTATATACAGAA
CAGAAATGTCAAGTTATATACAGAATATAGT

I need to search every string in file A in all the lines in file B, and recover the first 10 lines from file B that contain each string from A. I have tried grep and awk but not with good results. Thanks

Best Answer

Since your patterns are only four to six lines, why not use them in an OR pattern? An example limiting to 10 matches that operates on a second file "bigDNA.txt":

grep -E 'GAGA|CAGA|GGGT|TATT' -m 10 bigDNA.txt

This will save you from manually typing the patterns from file patt.txt. It joins lines by | (append | to each line, remove newline, remove trailing |):

grep -E "$(sed 's#$#|#' patt.txt | tr -d '\n' | sed 's#|$##')" -m 10 bigDNA.txt
Related Question