As the page you linked implies, any point in time snapshot technique that includes the data files and the journal will suffice, LVM is just one option. EBS snapshots in Amazon EC2 will also work, as will similar snapshot solutions on SAN, NAS etc. You are not limited to LVM, but that is generally a solution people can implement themselves.
In terms of whether you can copy files to perform a backup, the answer is yes, but only if you stop all writes to the node you are backing up (thereby guaranteeing no changes to the files during the copy). You can do this in a couple of ways:
The most straight forward way is to just shut the node (this should be a secondary) down, copy the files, then start the node back up and let it catch up to the primary (check optime using rs.status()
). Rinse and repeat (if you wish) to cycle through all nodes in the set, though the nodes are all identical, so one copy should generally be enough.
The second way (mentioned by sysadmin1138) is to fsync (flush data to disk) and lock (prevent writes) the node but leave it running using the fsyncLock
command (again, this should be a secondary). Once you have completed the copy, you unlock the database using the fsyncUnlock
command. There are dangers inherent in this technique - for example (and particularly if you are using authentication), you should always lock and unlock on the same connection, otherwise you risk locking yourself out of the database and having to kill the process to recover.
As for other risks, it is common to use a hidden node for backups in each case, which prevents accidentally attempting reads from the node while it is behind, and/or while it is locked (depending on your methods).
Finally, there is one further (paid) option - MMS Backup. This service will essentially do all this for you (for a fee) and give you extras like point-in-time recovery and more - note: I work for MongoDB so I won't give you the hard sell here, but feel free to evaluate it yourself.
I do not understand why the failover to DC2 has to be done manually (even if other parts have to be done manually: one thing less on your to do list in case of a major failure is always a good thing!).
In general, my feeling is that there are conceptual flaws in your setup.
Here is how I would do it and why.
- I would not have manual failover. It is better to have slow access than none. What will happen in the current configuration is that if the primary fails, there will be a tie and therefor the whole set would enter secondary state, effectively turning the cluster into read-only mode. So even when everything else is fine in DC1 and there is no need for failing over to DC2, a failing primary will be a show stopper. With setup, you are artificially creating a single point of failure, effectively gainst the whole idea of a cluster, let alone a multi DC setup. Sounds like a Very Bad Idea™ to me. Automatic failover, even to DC2 sounds like a better idea. Slower reads and (depending on your write concern) slower writes still are better than read only mode.
- I would have a third datacenter with only one instance: an arbiter. An arbiter can easily be run on a micro-machine as it will only be called in case of an election and an election is a cheap task in terms of RAM and computation power. The arbiter will help the set to always have a majority: If one DC gets disconnected for whatever reason, the other DC and the arbiter will form a majority. So if one DC goes down, you have only to worry about your other parts of your application. You don't have to wait
- I am pretty sure that automatic failover for the other parts of your application can be achieved with some time and effort. Especially if you store all data in mongoDB and you have some sort of session replication available, it should be quite easy. Whether implementing automatic failover is worth the effort is pretty easy to calculate: Get your average downtime, find out how big the losses are created by this downtime in terms of money and customer satisfaction (if applicable). If the costs of implementing automatic failover is below or equal, go for automatic failover. I can help you with that if needed.
Best Answer
The usual reason for sharding is that your workload has exceeded the resources of a single server, so the expectation is that you would not be running
mongodump
viamongos
for a full sharded cluster backup. Backups are instead done by stopping the balancer and then backing up a config server as well as amongod
from each shard.As at MongoDB 3.0,
mongodump
does not provide an option to specify read preferences or tags. If you have specific secondaries in your DR data centre for backup purposes, you can use thesemongod
nodes in your backup procedure.Recommended backup strategies are included in the MongoDB manual:
Stop the balancer and backup all components of the sharded cluster with
mongodump
Stop the balancer and backup all components of the sharded cluster with file system snapshots
Backups with filesystem snapshots are faster to complete (and to restore) because they include all data and indexes. Backups using
mongodump
export the data and index definitions, but indexes will have to be rebuilt as part of themongorestore
procedure.FYI, there is an open feature request you can watch/upvote in the MongoDB issue tracker: TOOLS-630: Allow specifying readPreference (including tags).