MongoDB – Setting Up Automatic Failover Without Data Duplication

failovermongodb

Because we don't have much storage space and we already have a RAID system for data availability, we don't want to use mongodb's replication set with duplication of data.

I just want to know if mongodb's replication allows any automatic failover of mongod instances without duplication of data. I just need the mongodb service to be available, not the data.

I read the replication section on the mongodb site but i did'nt find any mention about my use case. What I want exactly is to have 2 machines running a mongod instance on each and doing automatic failover between them. They are connected to servers which store data (no additional duplication of data as we have already our RAID system). If the automatic failover is not possible in this case, we may need to do it manually.

Any experiences, suggestions or related links are welcome. Thanks in advance.

Best Answer

Your setup is plainly wrong.

First, what sense does it make to have automatic failover when your storage system or the connection to it creates a single point of failure? And if you have a storage system which eliminates every single point of failure ( redundancy in power, network interfaces plus network infrastructure, RAID controllers, main boards and according RAM ), this would be much more expensive than setting up two simple boxes (plus a virtualized arbiter) in a replica set. And you still would have only the same advantage as the much simpler and easy solution.

Next, the backup issue. Granted, SAN snapshots are propably the easiest way to create backups and they are pretty fast. However, there will be an interruption in the service, however short it may be, when doing a snapshot on one instance of data only.

Third, a situation dreaded in all high availability scenarios: the split brain situation. How should you deal with that when both data bearing nodes would try to access the same data set? Since one would hold the lock on the data set but is demoted to secondary by an election and the elected primary can not get hold of the data set as the other instances is holding the lock, the newly elected primary would step down. Let me note that this scenario could only very theoretically take place if you started a second node by means of HALinux or something with a very strange setup. You would start to have to deal with something like STONITH, which comes with it's very own and sometimes rather delicate problems, further increasing the most expensive ressource you have: continuous administrative costs. A classical HA setup requires 24/7 monitoring and response times measured rather in seconds than in several minutes. MongoDBs failover capabilities are - when set up correctly - reliable and usually works without the need of manual intervention unless in rare edge cases. Edge cases of a failover, that is. So the edge cases of anyway rare edge cases.

Having that said: no, what you want is not possible with MongoDB out of the box. The reason for that is that there are lock files created in the data directory, preventing the second server from accessing the same data. So what you would have to do is to set some HALinux, remove that lock file before firing up the second server and then fire up the IP address. This comes with some other serious drawbacks from the MongoDB side, however. The data may be inconsistent, as the first server might not be able to flush it's data to the data files. Thinking of it twice, it might well be that the lock is only checked during startup, which might actually lead to the two servers flushing their data on the same data set, resulting in data FUBAR.

TL;DR: Use MongoDB (or any other tool, for that matter) within it's intended environment and for it's intended purpose. Don't do it as you originally planned. It's neither worth the effort nor does it give any advantage except saving - compared to the costs you create by the idea like how you want to do it - relatively cheap disk space. Don't take chances with your production data.