Mongodb – Can MongoDB be configured to sit behind a load balancer

mongodbPROXYreplication

According to this post:

In a single replica set, you cannot distribute writes, they all must
go to the primary. You can distribute reads to the secondaries
already, via Read Preferences as you deem appropriate. The driver
keeps track of what is a primary and what is a secondary and routes
queries appropriately.

According to the Mongo docs:

You may also deploy a group of mongos instances and use a proxy/load
balancer between the application and the mongos. In these deployments,
you must configure the load balancer for client affinity so that every
connection from a single client reaches the same mongos.

So basically, it seems like if you've got a single replica set of 3 nodes, you can't really use a proxy/load balancer since all writes need to go to the primary and you need client affinity… so all reads also need to go to the primary.

What I'm thinking though is that it might be possible to have applications connect to a load balancer. The load balancer would route all requests to the primary (not very balanced, but whatever)… until/unless the primary went down – at which point the load balancer would start routing requests to a "new primary".

I'm not sure if this is possible however since, how would the load balancer know which mongo server had been elected the new primary (and thus where it should route new requests)?

Assuming it was possible, this would achieve a degree of redundancy, in case the primary ever goes down… I'm also hoping it would also have the side effect of avoiding stale writes when a network partition occurs, since the load balancer (and thus all DB clients) would only ever connect to a single primary.

Or is this a stupid question…

Best Answer

You need to read closely. A mongos is the query router providing access to a sharded cluster. A mongos is well aware of the underlying replica sets, the (re-)elections and last but not least, which node is primary of a shard's replica set.

Having multiple mongos has various advantages. A usual setup would be to have one mongos per application server. That setup may be undesired, for example because you have automatic scaling for your application servers based on load. You can set up a multitude of machines with mongos query routers and pass all of these instances via your connection string to your application servers. The problem here is that all queries would go to the first mongos listed. In order to circumvent that, you could put a tcp load balancer in front of your mongos instances.

For replica sets, nothing of this is necessary. First of all, (most, that is all major) drivers are well aware of the fact they are connecting to a replica set, if properly configured. With replica set aware drivers, they automatically determine the current primary for writes. For sort of load balancing, there is a notion called read preference. Simplified: On a per query basis, you can choose to read from a secondary, accepting the possibility to read outdated data, as per eventual consistency. Again, (most) drivers are aware of that and there is no need for a load balancer.

Related Solutions

Mongodb: replica-set processing reads on the primary

I'm not sure what documentation you've read so apologies if I'm repeating anything here.

To distribute reads to secondary nodes, most drivers allow you to set a readPreference value for the current session. Clients set read preference on a per-connection basis. With slaveOk, the driver should will always send queries to the secondaries, if they're available.

Distributing reads to secondaries requires the use of ReplicaSetConnection with ReadPreference.SECONDARY.

See “rs.slaveOk()” for more information and this link.

In the mongo shell, to enable secondary reads, issue the following command :

rs.slaveOk()

The PHP documentation for it is here but I'm guessing that may be the documentation you're referring to.

As a FYI, here's an old discussion about it on the MongoDB Google Group.

If you're still having issues, I'd recommend using the MongoDB Google Group and providing some further information such as the version of MongoDB you're using, the version of the PHP driver, your log files, rs.conf() and rs.status().

As a FYI, you have to be careful with read scaling as sending too many reads to the secondaries can often result in the secondaries lagging the primary and becoming stale, thus requiring a full resync.

Mongodb – Mongo Replication with VIP

_{Community Wiki answer generated from comments on the question by James Wahlin and Markus W Mahlberg}

James: The arbiter is a voting member in an election but doesn't appoint members as primary or control the process. I would suggest reading: Replica Set Elections in the MongoDB manual.

When there is an election, all client connections are dropped by MongoDB. At that point in time the client will connect to one of its seeds and perform an isMaster call. This call reports back to the client which member is primary and which members are secondaries. In that way the client will seamlessly write to the correct member, based on read preference.

Markus: It will route requests from the application to the secondary. The drivers are replica set aware. Since all connections are either actively dropped by the old primary or stale, the driver will try to find the current primary in the list of replica set members it knows. You can verify this behaviour by setting up a replica set locally, running at 3 different ports on your local machine and kill one of them at random. After the short while an election needs to be finished, your client application writes to the newly elected primary.

Best Answer

Related Solutions

Mongodb: replica-set processing reads on the primary

Mongodb – Mongo Replication with VIP

Related Question