Is it faster to look for long or short search terms? Or does it affect speed at all? In other words, should you make search terms as exact as possible?
There are more than 100 000 files and each file contains between 20 and more than 5000 rows of data. Usually the grep is used to find just one instance of the search term.
Let's say the search term is SEARCHTERM
, and it will be in a row like this:
NAD+DP+1234567890:92++UNIQUE+NAME+SEARCHTERM++12345+FI'
Is it faster to look for "SEARCH" or "SEARCHTERM"? Let's say that in this case we don't care if we also find matches in other unrelated lines.
This is how I currently do it:
grep NAD+DP 123* | grep SEARCHTERM
But I find it quite slow, still. It usually takes around 3-5 minutes to find the data, even when I know the rough filename, which limits the range to around 10 000 files.
So, would a longer or shorter search term help? As far as I know, grep looks for "blocks" of words of a certain length?
Best Answer
Some reference material:
from Why GNU grep is fast.
from Boyer–Moore string search algorithm.
Conclusion: Use longer strings.
Now, a bit of benchmark for fun:
Results: 0.952s is the average for the short string, 0.244s is the average for the long string.
NB: The length is not the only criterion to be taken into account.