Shell – How to (Memory Limited) > grep -F -f file_A file_B >> output.txt

greplinuxscriptingshell-scripttext processing

file_A (~500MB, 1.6M lines) consists of all equal length search terms, 1 per line, not sorted.

file_B consists of all equal length text lines, 1 per line, not sorted

I've been able to run "grep -F -f file_A file_B >> output.txt" with any size file_B without problem on a box with 52GB ram. Problem is I'm now limited to 4GB ram and thus the size of file_A is now too large for this to run without exhausting available memory.

Short of manually chopping up file_A into smaller bites, is there any easy way to script this to grep for first 1000 lines of file_A, then when thats finished to automatically grep for lines 1001-2000, ect. until I've gone through all of file_A?

Best Answer

Loop through chunks of file_A, sending them as stdin to the same grep statement; adjust 1000 to your available memory:

nlines=$(wc -l < file_A)
chunk=1000
for((i=1; i < nlines; i += chunk)) 
do 
  sed -n $i,+$((chunk - 1))p file_A | grep -F -f - file_B
done > output
Related Question