SQLite – Multiple Concurrent Connections to Large File on Network Share

sqlite

I have a 7GB SQLite file and I am running multiple queries on it from different virtual machines in the cloud. Some of the queries are running in Python using the built in sqlite3 package, and others originate from SQL queries running in DB Visualizer. The file itself is on a network share (and for policy reasons cannot be copied locally).

Will these concurrent queries slow each other down?

I could copy the SQLite file multiple times on the network share and connect each concurrent query to its own copy of the file. Would that speed things up?

I can imagine that both the sqlite3 Python driver and DBVisualizer copy the database file into local temporary storage first (or into memory) in which case there is no speed-up to be gained, or they don't.

Best Answer

Concurrency is not a problem, as long as you only read. SQLite doesn't allow concurrency on writes, because it never was designed for that. I don't know the internals of the python driver and DBVisualized, but expect both smb, your os and any middleware to cache automatically at least parts of the file, because unless requested so, it is done automatically to speed up filesystem access. Details would depend on os configuration and access details.

I would like to comment on something you didn't ask about, which is the fit of your architecture:

  • You need to access a database over the network
  • You need to have certain level of concurrency
  • You need to control access, because policy reasons (and probably some of those policy reasons could be backups)
  • You have expectations about performance
  • You need to control in a more fine-grained way what is on memory and what is on disk
  • Your database has grown beyond a trivial size

(and those are only the ones that are perceived from your question)

SQLite is a great piece of software, but please consider if it is the right technology for you, as this is almost a paradigmatic case of when not to use it. Sure, you can create a loop device and copy it in memory, and you may get some improvements, but you could also migrate to a distributed system with an integrated cache system (like any other multi-user relational database) and let the technology solve all this issues for you.

Accessing sqlite using a network share is really a hack, and network filesystems are known to have caused data loss due to its limitations towards write consistency.