Mongodb – Best design of mongo replica set for DB read/write speed

mongodbperformancereplication

I'm building a Node app that uses MongoDB. Whilst it would probably be fine to have node and mongo sitting on the same VPS, I want to ensure that the MongoDB is backed up and maintain high availability in the case the VPS goes offline.

Looking in to the options available to me, one stood out: creating a replica set.

I have a couple of questions about how best to design the replica set to ensure the maximum speed of my app (as it does a lot of DB read/write):

  1. Surely running mongo on a remote server will slow things down considerably? Instead of performing read/write requests on the same disk, each DB operation has to be sent across datacenters to the remote server. For example, say the ping time between the two servers was 100ms, then making 10 requests would slow down the app by almost 1 second!

  2. With that in mind, would the following setup work?:

    • Server 1: Node + MongoDB (primary member of replica set)
    • Server 2: MongoDB secondary member – located in a separate datacenter ideally close to Server 1
    • Server 3: MongoDB secondary member – located in a separate datacenter ideally close to Server 1

That way, whilst everything is working normally, I benefit from the speed of executing DB commands on the same server as the node app; and when things go wrong, I still have redundancy.

Best Answer

It is not a great practice to mix the app and database on the same server. You're competing for resources. You also mentioned remote execution versus local execution on the same disk. I'd contend that you could suffer from disk contention if it was all on the same disk and the overhead of the remote replication is questionable by comparison. A lot of factors play into that including the type of disk(s) and if you are separating the OS, logs, and data.

Regarding your topology, yes a best practice in any DB technology is to place replica set members in a different datacenter. Pending your disaster recovery requirements, having at least one of those close to the primary node would be beneficial. There are other considerations such as an emergency on one coast of the US for example that makes all 3 unavailable. That is a case by case scenario to consider though. You also should consider your read and write concerns when addressing performance and availability. The default write concern will be 1. If your greater concern is availability; you may want to use a WC of 2 or majority in the example you gave. Testing your workload will only prove this for your use case and the WC can be set per operation so if you have some operations that are less important you could just use a less reliable WC. You should explore the bulk insert method as well. Personally for time-series data I have found the default and maximum batch of 1000 to be very good. I've also had good results increasing it to 5000 and allowing the DB engine break the batches out into 1k batches. Anything after that has demonstrated poorer inserts per second.

It is quite a lengthy topic but these are some ideas.