Mongodb – 294 : Failed with error ‘aborted’, from rs4 to rs0

mongodb

Regarding failed with error aborted will we lose the chunks

enter image description here

In logs I found this:

2017-09-14T01:35:21.533+0900 W SHARDING [conn38309]
Chunk move failed :: caused by :: ChunkRangeCleanupPending: 
can't accept new chunks because there are still 4 deletes from previous migration
2017-09-14T01:35:11.439+0900 I COMMAND  [conn38309] command admin.$cmd
command: moveChunk { moveChunk: "transam_db.panelist", 
shardVersion: [ Timestamp 762000|1, ObjectId('590b4564c5ac1ee3b6e9050d') ],
epoch: ObjectId('590b4564c5ac1ee3b6e9050d'), 
configdb: "configReplSet/mgdb07:27019,mgdb08:27019,mgdb09:27019", 
fromShard: "rs1", toShard: "rs4", 
min: { panelist_id: 407157 }, 
max: { panelist_id: 416836 }, 
chunkVersion: [ Timestamp 762000|1, ObjectId('590b4564c5ac1ee3b6e9050d') ],
maxChunkSizeBytes: 67108864, waitForDelete: false, takeDistLock: false }
exception: can't accept new chunks because there are still 4 deletes from previous migration 
code:200 numYields:75 reslen:278 locks:{ Global: { acquireCount: { r: 161, w: 3 } }, 
Database: { acquireCount: { r: 79, w: 3 } }, 
Collection: { acquireCount: { r: 79, W: 3 } } } 
protocol:op_command 2622ms

Due to this, it's unable to move the chunks. Please give a solution.

Best Answer

Answer is NO.

At 3.4 mongodb we got multi threaded chunk balancing and it is "little bit" stupid that way that it can try to move chunk (from or to) shard what is currently already participating in some other chunk move operation. Of course, that is not possible and then that new move operation is aborted.

If you start seeing huge aborted values, like thousands, check log files of those shards whose name is most listed on error list. Like rs4 there. From log files you see what was "reason" why move was aborted. There can be be cursor with no timeout and previous move operations remove is hanging there.

But, aborted chunk move does not lose data, ever. Data is never removed until it is copied to the new location and then mark to be there.

To find a possible reason for those aborts, all (primary) mongod.log files must be checked. Without them, it's impossible. There is always from node and to node, both log files have information about "that" movechunk operation what was aborted.