During upgrading config servers to use WiredTIger, I stopped balancer using sh.setBalancerState(false), and then I run sh.getBalancerState(). The output is false. Does this mean the balancer is not running? After this, I just started to upgrade the config servers to use WiredTiger. But after reading the document carefully, I am not sure whether no migrations are in progress after running sh.setBalancerState(false)? If some migrations are running, but I backup the config data and stopped the config servers one by one. What is the bad effect? Now the config servers are all up with WiredTiger, how to check whether the config servers have the same data, especially config data, meta data….?
MongoDB – Understanding sh.getBalancerState() vs sh.isBalancerRunning
linuxmigrationmongodb
Related Solutions
The current wording in the documented steps may be somewhat confusing given the relative references to first/last/second config servers.
I've added some context for the documented steps below, but you should consider the manual the definitive source. It's also worth noting that you do not have to upgrade the config servers to use WiredTiger (even if your shards are using WiredTiger). Config servers typically have a small data set and are not under high write load.
Annotated version of the steps from the MongoDB 3.0 Upgrade Guide
-
Disabling the balancer ensures any active migrations have completed.
Stop the last config server listed in your mongos'
configDB
setting (will call thatconfig3
for the purpose of these steps):At this stage you should have:
config1
(running mmap)config2
(running mmap)config3
(stopped)
Stopping one of the config servers ensures there are no changes to the metadata in the cluster (chunk splits or migrations cannot be committed without all three config servers available).
Use
mongodump
to export the config database from config2After running the
mongodump
you should have adump
directory with bson files.Create a new data directory on
config2
.The storage format for WiredTiger data is different from the existing mmap data, and cannot use the same dbpath as mmap.
Restart the
config2
server with the WiredTiger and appropriate storage options:mongod --storageEngine wiredTiger --dbpath <newWiredTigerDBPath> ...
At this stage you should have:
config1
(running mmap)config2
(running WiredTiger with no data)config3
(stopped)
Note that
config2
doesn't have any data yet, because it has just been started up with the new WiredTiger dbpath.Use
mongorestore
to load the config database backup you created in step 3.At this stage you should have:
config1
(running mmap)config2
(running WiredTiger)config3
(stopped)
Shut down
config2
:At this stage you should have:
config1
(running mmap)config2
(stopped)config3
(stopped)
config2
is stopped at this point to ensure no metadata changes can happen when we startconfig3
up in the next step.Restart
config3
.At this stage you should have:
config1
(running mmap)config2
(stopped)config3
(running mmap)
(steps 9-15) These steps are just repeating the same mongodump
and mongorestore
for each config server. There's a bit of shuffling to ensure you always have at least one config server available, and do not have all three up while you are still migrating data.
Upload the data into
config1
At this stage you should have:
config1
(running WiredTiger)config2
(stopped)config3
(running WiredTiger)
Start
config2
With all three config servers are available & upgraded, changes to the sharded cluster metadata can now resume.
Re-enable the balancer so normal balancing activity & chunk migration can resume.
It looks like the current MongoDB 3.0 upgrade instructions are missing mention of two important parameters for backing up and restoring users and roles:
mongodump --dumpDbUsersAndRoles
(see also: Required Access to Backup User Data).mongorestore --restoreDbUsersAndRoles
(see also: Required Access to Restore User Data)
I can think of several approaches to fix:
If you don't have many user accounts on the config servers, recreate the administrator & user accounts. This isn't ideal, but is probably the fastest approach.
Export the users from your mmap database. This is more involved, but saves you recreating the users & roles. I've described steps for this below.
Redo the config server migration with the user & role information included. I expect this is the least desirable option.
Exporting the users
Assuming you have already upgraded all of your config servers to WiredTiger, here are some steps to add the user information:
Stop the last config server listed in your mongos'
configDB
setting (will call thatconfig3
for the purpose of these steps). This will ensure your sharded cluster metadata remains read-only for the following steps.Re-start
config2
using the mmap data directoryAt this stage you should have:
config1
(running WiredTiger)config2
(running mmap with user/role data)config3
(stopped)
Export the data from
config2
:mongodump --db config --dumpDbUsersAndRoles --username .. --password ..
Add any other parameters needed, eg
--authenticationDatabase ..
if you need to auth against another database.If you have users in the
admin
database on your config server, you will also want to dump that as well.(optional) Remove files from your dump except for the user/role information. If you are certain nothing has changed since you did the original migration from mmap to WiredTiger you could skip this step, however it would be safer to not overwrite any existing data.
Preview the files to remove:
find ./dump -type f -not -name "\$admin.system*"
WARNING: removing files, make sure you have previewed to confirm:
find ./dump -type f -not -name "\$admin.system*" | xargs rm
Re-start
config2
using the wiredTiger storage engineRun:
mongorestore --db config --restoreDbUsersAndRoles dump/config/
You should see messages about restoring users & roles, for example:
2015-03-18T02:41:34.887+1100 restoring users from dump/config/$admin.system.users.bson
2015-03-18T02:41:34.887+1100 restoring roles from dump/config/$admin.system.roles.bson
Login to
config2
and confirm the users are correctly setup (i.e. auth with admin account, usedb.getUsers()
to check).At this stage you should have:
config1
(running WiredTiger)config2
(running WiredTiger with user/role data)config3
(stopped)
Copy the
dump
directory toconfig1
and repeat themongorestore
step.Shutdown
config2
(to keep the sharded cluster metadata readonly for the next step).At this stage you should have:
config1
(running WiredTiger with user/role data)config2
(stopped)config3
(stopped)
Start
config3
. Copy thedump
directory toconfig3
, and repeat themongorestore
step.At this stage you should have:
config1
(running WiredTiger with user/role data)config2
(stopped)config3
(running WiredTiger with user/role data)
Start
config2
. At this point all config servers should be online with the user information.Re-enable the balancer so normal balancing activity & chunk migration can resume.
Related Question
- mongodb – Comprehensive Guide to MongoDB Sharding
- MongoDB 3.0 Upgrade – Resolving HostAndPort Exception
- MongoDB Sharded Cluster Upgrade from 2.4 to 3.2 version
- Mongodb – Read concern level of majority error while upgrading Sharded Cluster from 3.2 to 3.4
- Mongodb – The reading with secondaryPreferred when chunk migration occurred often fail
- Mongodb – Balancer is down after stopping config servers
- Mongodb – Sharding standalone MongoDB
Best Answer
You can verify that no migrations are running by checking the balance with
which is
true
if chunks are being migrated andfalse
if not. UsingBalancerState
only shows you if it is enabled or disabled, not its current run state. While it depends on what the specific documentation says, I'd probably feel safer setting the balancer state to false, checking the migration status with the above command, and then stopping it:So now that we have the proper method clarified,
I'm not too sure how gracefully MongoDB would potentially handle this issue. However, you should be able to find the steps that occur during a migration in your log.
Always check the logs!
Update: As specified by the OP, you can also use
sh.status()
if this work occurred in the last 24 hours to check if there are any recorded errors in migrations from the balancer. If > 24 hours, go check the logs.Update 2: Marcus clarified in the comments that partial migrations are not possible, so this should not be a concern.