Mongodb – 294 : Failed with error ‘aborted’, from rs4 to rs0

mongodb

Regarding failed with error aborted will we lose the chunks

In logs I found this:

2017-09-14T01:35:21.533+0900 W SHARDING [conn38309]
Chunk move failed :: caused by :: ChunkRangeCleanupPending: 
can't accept new chunks because there are still 4 deletes from previous migration

2017-09-14T01:35:11.439+0900 I COMMAND  [conn38309] command admin.$cmd
command: moveChunk { moveChunk: "transam_db.panelist", 
shardVersion: [ Timestamp 762000|1, ObjectId('590b4564c5ac1ee3b6e9050d') ],
epoch: ObjectId('590b4564c5ac1ee3b6e9050d'), 
configdb: "configReplSet/mgdb07:27019,mgdb08:27019,mgdb09:27019", 
fromShard: "rs1", toShard: "rs4", 
min: { panelist_id: 407157 }, 
max: { panelist_id: 416836 }, 
chunkVersion: [ Timestamp 762000|1, ObjectId('590b4564c5ac1ee3b6e9050d') ],
maxChunkSizeBytes: 67108864, waitForDelete: false, takeDistLock: false }
exception: can't accept new chunks because there are still 4 deletes from previous migration 
code:200 numYields:75 reslen:278 locks:{ Global: { acquireCount: { r: 161, w: 3 } }, 
Database: { acquireCount: { r: 79, w: 3 } }, 
Collection: { acquireCount: { r: 79, W: 3 } } } 
protocol:op_command 2622ms

Due to this, it's unable to move the chunks. Please give a solution.

Best Answer

Answer is NO.

At 3.4 mongodb we got multi threaded chunk balancing and it is "little bit" stupid that way that it can try to move chunk (from or to) shard what is currently already participating in some other chunk move operation. Of course, that is not possible and then that new move operation is aborted.

If you start seeing huge aborted values, like thousands, check log files of those shards whose name is most listed on error list. Like rs4 there. From log files you see what was "reason" why move was aborted. There can be be cursor with no timeout and previous move operations remove is hanging there.

But, aborted chunk move does not lose data, ever. Data is never removed until it is copied to the new location and then mark to be there.

To find a possible reason for those aborts, all (primary) mongod.log files must be checked. Without them, it's impossible. There is always from node and to node, both log files have information about "that" movechunk operation what was aborted.

Related Solutions

Mongodb – “ERROR: could not read from config file” MongoDB

First, assuming you are actually specifying where to read the file from, make sure that you have permission read that file with the current user (cat /usr/local/Cellar/mongodb/2.4.6/mongod.conf - or use less/vi/editor of choice). Assuming that works (and if it does not, adjust your permissions), then the next thing you need to do is make sure you are actually pointing at the correct file.

However, if you are not specifying where to read the file from, by default, if you just run mongod using the brew installation it will attempt to read from:

/usr/local/etc/mongod.conf

I verified this by installing 2.4.6 with brew and then checking the logs when it starts up:

Tue Oct 22 17:17:30.695 [initandlisten] options: { bind_ip: "127.0.0.1", config: "/usr/local/etc/mongod.conf", dbpath: "/usr/local/var/mongodb", logappend: "true", logpath: "/usr/local/var/log/mongodb/mongo.log" }

You can either modify that file (/usr/local/etc/mongod.conf) to look the way you want in your example, and make sure you have permissions to get to it, or you can run this instead to specify the original file:

mongod -f /usr/local/Cellar/mongodb/2.4.6/mongod.conf

Mongodb – Mongo sharding issue with chunk split and Data transfer

You really only have 2 choices when a primary is still processing deletes from previous migrations (which is why you are getting the failure to engage error):

Wait for the deletes to finish
Step down the primary of that shard (assuming it is a replica set)

The first action may take a long time if the shard in question is under heavy load, but it is the safest way forward. The second option will cause the primary to step down to secondary and terminate the background threads for the deletes - if you have only a single node then your only option is a restart, and downtime. Either way, this will allow the new primary (or restarted node) to accept migrations, but it will create orphans which you will need to manually clean up later.

If you are in any doubt, then the best thing to do is wait for the deletes to finish.

Best Answer

Related Solutions

Mongodb – “ERROR: could not read from config file” MongoDB

Mongodb – Mongo sharding issue with chunk split and Data transfer

Related Question