Shell – check patterns that don’t exist in sqlite

efficiencyperformanceshellsqlsqlite

I explained a similar situation with plain text files on Grep huge number of patterns from huge file. Many people there said I should, so now I'm migrating my data to a sqlite database:

I have a file from which I extract about 10,000 patterns. Then I check if the database doesn't contain such patterns. If it doesn't, I need to save them externally in file for further processing:

for id in $(grep ^[0-9] keys); do
  if [[ -z $(sqlite3 db.sqlite "select id from main where id = $id") ]]; then
    echo $id >>file
  fi
done

Since I'm new to SQL, I couldn't find a simple way to do this. Also, this loop is useless as it is 20 times slower than what I achieved with awk on the mentioned URL.

Since the database is huge, keeps growing, and I run this loop very frequently, is it possible to make this faster?

Best Answer

For each pattern, you're invoking a new instance of the sqlite program which connects to the database anew. That's a waste. You should build a single query that looks for any of the keys, then execute that one query. Database clients are good at executing large queries.

If the matching lines in the keys file only contain digits, then you can build the query as follows:

{
  echo 'select id from main where id in (';
  <keys grep -x '[0-9][0-9]*' |     # retain only lines containing only digits
  sed -e '1! s/^/, /' |             # add ", " at the beginning of every line except the first
  echo ');'
} | sqlite3 db.sqlite

For more general input data, you get the idea: use text transformations to build a single large query. Be careful to validate your input; here we make sure that what gets injected into the query is syntactically valid. There's actually a corner case in the example above: if there is no match in the file, then the SQL syntax is invalid; if that might happen, you'll need to treat this case specially. Here's more complex code that takes care of the empty case:

<keys grep -x '[0-9][0-9]*' |
if read first; then {
    echo 'select id from main where id in (' "$first"
    sed -e 's/^/, /'
    echo ');'
  } | sqlite3 db.sqlite
fi
Related Question