Mongodb replication node stuck at “STARTUP2” with optimeDate as 1970

mongodbreplication

i have just setup replica sets with three nodes . the third node is stuck at stateStr STARTUP2 with "optimeDate" : ISODate("1970-01-01T00:00:00Z"). However its showing no error message. Is this alright. On primary rs.status() yeilds

{
    "set" : "qdit",
    "date" : ISODate("2013-06-18T22:49:41Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 0,
            "name" : "q.example.com:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 2940,
            "optime" : {
                "t" : 1371593311,
                "i" : 1
            },
            "optimeDate" : ISODate("2013-06-18T22:08:31Z"),
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "q1.example.com:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 457,
            "optime" : {
                "t" : 1371593311,
                "i" : 1
            },
            "optimeDate" : ISODate("2013-06-18T22:08:31Z"),
            "lastHeartbeat" : ISODate("2013-06-18T22:49:40Z"),
            "lastHeartbeatRecv" : ISODate("2013-06-18T22:49:40Z"),
            "pingMs" : 0,
            "syncingTo" : "twitnot.es:27017"
        },
        {
            "_id" : 2,
            "name" : "q2.example.com:27017",
            "health" : 1,
            "state" : 5,
            "stateStr" : "STARTUP2",
            "uptime" : 300,
            "optime" : {
                "t" : 0,
                "i" : 0
            },
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2013-06-18T22:49:40Z"),
            "lastHeartbeatRecv" : ISODate("2013-06-18T22:49:41Z"),
            "pingMs" : 7
        }
    ],
    "ok" : 1
}

also

db.printSlaveReplicationInfo() on yields

source:   qdit1.queuedit.com:27017
     syncedTo: Tue Jun 18 2013 22:08:31 GMT+0000 (UTC)
         = 2894 secs ago (0.8hrs)
source:   qdit2.queuedit.com:27017
     syncedTo: Thu Jan 01 1970 00:00:00 GMT+0000 (UTC)
         = 1371596205 secs ago (380998.95hrs)

Is this alright. Also how can i test my replication especially the third node

Best Answer

No, this is not OK, STARTUP2 should only be a state that a secondary is in briefly on its way to a full sync and SECONDARY status (via RECOVERING usually) - see the states table for more. However, without seeing log files it's impossible to say why it is stuck. The 1970 date you are seeing in the optime is basically the epoch/Unix time version of zero, indicating that it has not applied any ops.

The basic method for restoring this secondary would be to shut it down, wipe out its data files (all files in the dbpath), and restart it. That will restart the initial sync process and it should get to SECONDARY. If it gets stuck again, then look at the logs for more information as to why it is happening - the most common causes would be some sort of resource issue or possibly not being able to talk to the primary in the set reliably.