Mongodb – different db stats in MongoDB replica set

mongodbmongodb-3.2replication

I have a MongoDB (3.2.6) replica set with two nodes and an arbiter. The secondary node was out for quite a long time and was recently brought back up with an initial sync.
So at the moment the rs.status() is as follows:

rs.status()

{
    "set" : "anonymizedReplicaSet",
    "date" : ISODate("2017-05-24T07:33:08.629Z"),
    "myState" : 2,
    "term" : NumberLong(-1),
    "syncingTo" : "anonymized01:27017",
    "heartbeatIntervalMillis" : NumberLong(2000),
    "members" : [ 
        {
            "_id" : 1,
            "name" : "anonymized01:27017",
            "health" : 1.0,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 157061,
            "optime" : Timestamp(6423603132856545, 12785),
            "optimeDate" : ISODate("2017-05-24T07:40:52.000Z"),
            "lastHeartbeat" : ISODate("2017-05-24T07:33:06.790Z"),
            "lastHeartbeatRecv" : ISODate("2017-05-24T07:33:06.790Z"),
            "pingMs" : NumberLong(0),
            "electionTime" : Timestamp(6415474245823889, 1),
            "electionDate" : ISODate("2017-05-02T09:56:38.000Z"),
            "configVersion" : 10
        }, 
        {
            "_id" : 2,
            "name" : "anonymized02:27017",
            "health" : 1.0,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 157062,
            "optime" : Timestamp(6423603141446477, 10075),
            "optimeDate" : ISODate("2017-05-24T07:40:54.000Z"),
            "syncingTo" : "anonymized01:27017",
            "configVersion" : 10,
            "self" : true
        }, 
        {
            "_id" : 3,
            "name" : "anonymizedas:27017",
            "health" : 1.0,
            "state" : 7,
            "stateStr" : "ARBITER",
            "uptime" : 157061,
            "lastHeartbeat" : ISODate("2017-05-24T07:33:06.790Z"),
            "lastHeartbeatRecv" : ISODate("2017-05-24T07:33:07.248Z"),
            "pingMs" : NumberLong(0),
            "configVersion" : 10
        }
    ],
    "ok" : 1.0
}

However, I noticed that the two nodes do not agree on their stats, e.g., when I issue db.stats() simultaneously to the two servers I get:

Primary Node db.stats()

{
    "db" : "mydb",
    "collections" : 8,
    "objects" : 43933967,
    "avgObjSize" : 575.327071124718,
    "dataSize" : 25276400557.0,
    "storageSize" : 11909304320.0,
    "numExtents" : 0,
    "indexes" : 12,
    "indexSize" : 997511168.0,
    "ok" : 1.0
}

Secondary Node db.stats()

{
    "db" : "mydb",
    "collections" : 8,
    "objects" : 44016455,
    "avgObjSize" : 576.927973845236,
    "dataSize" : 25394324199.0,
    "storageSize" : 9882480640.0,
    "numExtents" : 0,
    "indexes" : 12,
    "indexSize" : 901128192.0,
    "ok" : 1.0
}

Shouldn't the number of objects on the two nodes agree? In addition, I receive different results when issuing mydb.mycollection.count() commands. Is this normal behavior? Or is something wrong with the syncing of the two nodes?

Best Answer

Logically answer is YES.

When I tested on our multi terabyte DB's, I got differences between replica set members, however this was measuring mistake, error on the procedure. f.ex. db.stats() is not good command, because RS node can have db.system.profile -collection, what is NOT replicated to other nodes.

How to do it right (do these on all replica set nodes):

  1. Stop all write traffic to MongoDB, you can do that by closing all clients OR giving (at Primary) command db.fsyncLock()
  2. select right DB (use xyz) and give command db.setSlaveOk(true)
  3. Don't use db.collection.find().count() or db.collection.count(), use command db.collection.find({_id:{ $gt: MinKey, $lt:MaxKey} }, {_id:1}).itcount() to count how many documents there is
  4. After you have done these itcount() commands on multiple collections and you are happy (or not) to result.. Don't forget to give (at primary) command db.fsyncUnlock()

we had maintenance break and I did this counting procedure during it and got (this time) right result. Every RS node had same count of documents. ;-)