You have answered some of your own questions here, specifically you have a decent idea about the write lock aspect of the equation - 12,000 insert/sec gets you to ~60% write lock. That's a reasonable level to get consistent performance - you will be getting some contention, and some ops will be a little slower, but you really want to start worrying at about 80% - like a lot of things, when you start exceeding 80% available capacity you will start hitting issues a lot more often.
In terms of other bottlenecks, and specifically how quickly you can write to disk - this can cause problems, but to look at the relevant stats over time I would recommend getting MMS installed with the munin-node plugin to give you hardware and IO stats in addition to the MongoDB stats.
When you have that, the metrics you will want to keep an eye on are:
- The Average Flush time (this is how long MongoDB's periodic sync to disk is taking)
- The IOStats in the hardware tab (IOWait in particular)
- Page Faults (if your disk is busy with writes and you need to read data, they are going to be competing for a scarce resource)
It's a bit complicated then, but here's a basic idea:
- When average flush time starts to increase, be worried
- If it gets into the multiple second range, you are probably on the limit (though this depends on the volume of data written and the disk speed)
- If it approaches 60 seconds you will see performance degrade severely (the flush happens every 60 seconds, so they would essentially be queuing up)
- High IOWait is going to hinder performance too, especially if you have to read from disk at any point
- Hence looking at page fault levels will also be important
The other piece of this puzzle, which we have not mentioned yet, is the journal. That will be persisting data to disk as well (by default every 100ms) and so it will be adding to the disk's load if it is on the same volume. Hence if you are seeing high disk utilization, then moving the journal off to another disk would be a good idea.
There are no real "magic numbers" to stay under, in most cases it's all relative, so get a good baseline for your normal traffic, check to see if things are trending up and maybe load test to see what your limits are and when things start to degrade and you will be in good shape.
After all that pre-amble, on to some of your questions:
What happens if there are more inserts per second than mongod is able
to save to the hard disk? Will there be any warning or will it simply
fail silently?
If you start to stress the disk to the levels described above, eventually everything is going to slow down and at some point (and this will depend on time outs, how beefy your hardware is, how you handle exceptions) your writes will fail - if you are using a recent version of pymongo then you will be using safe writes by default and those will then fail. If you wish to be a little more paranoid, you can occasionally do a write concern of j:true which will wait to return OK until the write has made it to the journal (i.e. on disk). This will, of course, be slower than a normal safe write, but it will be an immediate indication of disk capacity related issues, and you could use it to block/queue other operations and essentially act as a throttle to prevent your database from being overwhelmed.
I am thinking about a simple replication setup using one master and
one slave. Does the initial sync or a resync process lock the
databases?
I think I covered locking overall at the start, but to answer this piece specifically: First, make sure you are using a replica set, not master/slave. The master/slave implementation is deprecated and not recommended for use in general. As for the initial sync will add some load to the primary in terms of reads, but not in terms of writes, so you should be fine in terms of locking.
What happens to my data if the write queue increases on long term?
As you can probably tell from the explanation above, the answer is very much dependent on how you write your application, how you choose to have your writes acknowledged and how much capacity you have available. You can, essentially, be as safe as you wish when it comes to writing to disk on MongoDB, but there is a performance trade off, as mentioned with the j:true
discussion above.
Generally, you want to figure out your limiting factor - be it locking, disk speed etc. and then track the levels over time and scale out (sharding) or up (better hardware) before you hit a hard limit and see performance problems.
One last thing, db.serverStatus().writeBacksQueued
is actually a metric that will only ever be non-zero in a sharded environment, and it has to do with making sure that writes to a chunk during a migration are dealt with appropriately (handled by the writeback listener). Hence it essentially is a red herring here - nothing to do with general write volume.
Best Answer
Option 1: Set up a sharded cluster to distribute writes. However a sharded cluster brings complexity in terms of additional components needed (mongos, config servers, multiple replica sets, etc), along with technological overhead you'll need to deal with (balancer, chunk migration, no point-in-time backup/restore, etc).
Option 2: Set default writeConcern to 0, a.k.a. 'fire and forget' mode. Read more here. Basically, your application writes to mongod process (mongod in Primary, if you use a replica set), and receive no acknowledgement back. Two ways to use writeConcern:
Read more about it here.