Mongodb – Improve the insertion rate of mongo sharded cluster

mongodbnosqlperformance

I can't disable the write concern while i mongorestore with .bson file. It totally damages the insertion rate (insert/sec).

This is my sharding status.

mongos> sh.status()
--- Sharding Status --- 
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("546bd498f0531e27d40544b2")
} shards: { "_id" : "shard2", "host" : "shard2/hadoop5.xx.com:27018" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "deneme6", "partitioned" : false, "primary" : "shard2" } { "_id" : "seref", "partitioned" : true, "primary" : "shard2" }

This is my shard. I already changed the property of getLastErrorDefaults w:0 for disable write concern.

shard2:PRIMARY> show tables; me oplog.rs startup_log system.indexes system.replset          shard2:PRIMARY> db.me.find() { "_id" : ObjectId("546bd2869e14c01f13708886"), "host" : "hadoop5.xx.com" } 
shard2:PRIMARY> db.system.replset.find() { "_id" : "shard2", "version" : 2, "members" : [ { "_id" : 0, "host" : "hadoop5.xx.com:27018" } ], "settings" : { "getLastErrorDefaults" : { "w" : 0 } } } 
shard2:PRIMARY>

This is my mongorestore command which load data via query router ( mongos )

mongorestore --db seref --collection can11 --noobjcheck --w 0 --noIndexRestore --host       10.21.12.21 --port 27017 /tmp/iabc_kwd.bson

When i try to mongorestore on sharded cluster via mongos the insertion rate is 300k byte / sec , when i try to mongorestore on single mongodb the insertion rate is 12m /sec.

How can i speed up the insertion rate of restoring mongodb sharded cluster?

-- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 3,
    "minCompatibleVersion" : 3,
    "currentVersion" : 4,
    "clusterId" : ObjectId("546ca29960e36f8983130015")
}
  shards:
    {  "_id" : "shard0000",  "host" : "hadoop5:27018" }
    {  "_id" : "shard0001",  "host" : "hadoop6:27017" }
    {  "_id" : "shard0002",  "host" : "hadoop4:27017" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
    {  "_id" : "seref",  "partitioned" : true,  "primary" : "shard0000" }
    {  "_id" : "abc",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "mahmut",  "partitioned" : true,  "primary" : "shard0002" }
    {  "_id" : "incee",  "partitioned" : true,  "primary" : "shard0002" }
        incee.onur
            shard key: { "_id" : "hashed" }
            chunks:
                shard0000   1
                shard0002   501
            too many chunks to print, use verbose if you want to force print

All shards computer have 8 GB RAM , and they have 1.3 TB HDD.

Best Answer

The problem (from what I can see)

From what I can see from your output, you have a monotonically increasing shard key like ObjectId or a Date or something similar.

MongoDB sharding is done with key ranges over the selected shard key. Put simply, it works like "shard0000 is supposed to hold the rang from -infinity to $i, shard0001 the keys from $h to $p and shard0002 from $q to infinity. With a monotonically increasing shard key, every new value will be bigger than the last one inserted, heading to infinity, so the shard which will be used is always shard0002, instead of the writes being balanced to all shards.

So all of the write operations seem to hit shard0002 with individual chunks being migrated to the other shards. The necessary chunk migrations together with added network latency and a lot of metadata updates going on, it may well be that you have a poor write performance.

The solution

Really bad news: you can't change the shard key once it is set. You basically have to create new sharded collection with a better shard key. For choosing a proper shard key, please read Considerations for Selecting Shard Keys in the MongoDB docs.

P.S. I wasn't sure wether to answer this question, since it is slightly off topic on Stackoverflow, which is aimed at programming questions. But since usually the developers are supposed to choose the shard key, my intention was to clarify things for them. Additionally, the data modeling might be influenced by sharding considerations, so – by a small margin – I thought it was ok to answer here. However, dear reviewer, feel free to flag this answer for deletion if you think differently. I am really not sure.

Related Solutions

Mongodb – Mongo sharding issue with chunk split and Data transfer

You really only have 2 choices when a primary is still processing deletes from previous migrations (which is why you are getting the failure to engage error):

Wait for the deletes to finish
Step down the primary of that shard (assuming it is a replica set)

The first action may take a long time if the shard in question is under heavy load, but it is the safest way forward. The second option will cause the primary to step down to secondary and terminate the background threads for the deletes - if you have only a single node then your only option is a restart, and downtime. Either way, this will allow the new primary (or restarted node) to accept migrations, but it will create orphans which you will need to manually clean up later.

If you are in any doubt, then the best thing to do is wait for the deletes to finish.

Mongodb – Chunk Size is shown as 1 KB in mongo db logs even it is set to 300 MB

How did you change the chunk size? Did you follow this guide?

To recap, one should:

connect to mongos
switch to config db
issue change chunk size command:
db.settings.save( { _id:"chunksize", value: <sizeInMB> } )

AFAIK,

warning: chunk is larger than 1024 bytes because of key ...

won't bring down a shard, it will only result in a unbalanced cluster.