Mongodb conf node won’t start – compatibility error

configurationmongodb-4.0sharding

I recently upgraded a dev sharded-cluster from 3.6 to 4.0 – one of the conf servers is now failing to start:

2020-01-30T13:19:02.972-0600 F CONTROL [initandlisten] ** IMPORTANT:
UPGRADE PROBLEM: Found an invalid featureCompatibilityVersion document
(ERROR: BadValue: Invalid value for version, found 3.6, expected '4.2'
or '4.0'. Contents of featureCompatibilityVersion document in
admin.system.version: { _id: "featureCompatibilityVersion", version:
"3.6" }. See
http://dochub.mongodb.org/core/4.0-feature-compatibility.). If the
current featureCompatibilityVersion is below 4.0, see the
documentation on upgrading at
http://dochub.mongodb.org/core/4.0-upgrade-fcv.

The upgrade from 3.6 -> 4.0 reported as successful – I checked the data shards but didn't think to check each conf server… Today, I just upgraded the mongo binaries and on reboot, encountered the problem with one of the conf servers failing to start with the above mentioned error. The cluster had been running at 4.0 for weeks before I noticed this problem following the update reboot.

The other two conf servers (running) are reporting that they're at 4.0 so life is good there, as are all of the data shards.

I cannot start the conf server without encountering this error which shuts-down the server making it impossible to issue the set-feature-compatibility-version directive from mongos.

Since the other two conf nodes are running and reporting the correct release version, would it be best to just nuke the down-server's data, attempt to restart the node, and then issue the command to ensure that the restarted node's data version is correct? Or, is there some sort of force command that would bypass the version check?

TIA!

Best Answer

Went with the nuclear option:

  • stopped the orphaned server
  • moved the data dir and recreated new dir
  • removed orphaned server from primary
  • started orphaned server with null data dir
  • added orphaned server to pool from primary

data started sync'ing immediately and rs.status() for the cluster config servers reporting all is well.