I explained a similar situation with plain text files on Grep huge number of patterns from huge file. Many people there said I should, so now I'm migrating my data to a sqlite database:
I have a file from which I extract about 10,000 patterns. Then I check if the database doesn't contain such patterns. If it doesn't, I need to save them externally in file
for further processing:
for id in $(grep ^[0-9] keys); do
if [[ -z $(sqlite3 db.sqlite "select id from main where id = $id") ]]; then
echo $id >>file
fi
done
Since I'm new to SQL, I couldn't find a simple way to do this. Also, this loop is useless as it is 20 times slower than what I achieved with awk
on the mentioned URL.
Since the database is huge, keeps growing, and I run this loop very frequently, is it possible to make this faster?
Best Answer
For each pattern, you're invoking a new instance of the
sqlite
program which connects to the database anew. That's a waste. You should build a single query that looks for any of the keys, then execute that one query. Database clients are good at executing large queries.If the matching lines in the
keys
file only contain digits, then you can build the query as follows:For more general input data, you get the idea: use text transformations to build a single large query. Be careful to validate your input; here we make sure that what gets injected into the query is syntactically valid. There's actually a corner case in the example above: if there is no match in the file, then the SQL syntax is invalid; if that might happen, you'll need to treat this case specially. Here's more complex code that takes care of the empty case: