I have a questions about mongodb sharding.
I have 7 VM with 1 master mongos and 3 config server en 3 shard servers.
Mongo Main - 192.168.212.173 poort 27020
Mongo Shard1 - 192.168.212.175 poort 27018
Mongo Shard2 - 192.168.212.176 poort 27018
Mongo Shard3 - 192.168.212.177 poort 27018
Mongo Config1 - 192.168.212.179 poort 27019
Mongo Config2 - 192.168.212.180 poort 27019
Mongo Config3 - 192.168.212.181 poort 27019
I configure all the servers with a shard key on the collection testDB.testData
On the master I log in with mongo 192.168.212.173:27020/admin (master Mongo)
When I look at the sh.status() I see the following:
sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
"clusterId" : ObjectId("55a4c899d077ed690ce794ff")
}
shards:
{ "_id" : "shard0000", "host" : "192.168.212.175:27018" }
{ "_id" : "shard0001", "host" : "192.168.212.176:27018" }
{ "_id" : "shard0002", "host" : "192.168.212.177:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "testDB", "partitioned" : true, "primary" : "shard0000" }
testDB.testData
shard key: { "name" : 1 }
chunks:
shard0000 1
{ "name" : { "$minKey" : 1 } } -->> { "name" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)
{ "_id" : "db", "partitioned" : false, "primary" : "shard0001" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0001" }
I insert about 10.000 records with the following json, where name is a unique string:
{
"_id" : ObjectId("55a4e563c27f4a62558b4567"),
"name" : "EDBfuGq57A"
}
I insert this with the following command:
$collection_sharding->save($mongo_data);
here the mongo_data is the json
If I look at where de data is stored, I see that is is only stored on Shard1 (192.168.212.175)
The sh.getBalancerState is true.
when I look at the monstats of al three shards servers I see that only Shard1 is looking at testDB:
mongostat --host 192.168.212.177:27018,192.168.212.176:27018,192.168.212.175:27018
192.168.212.175:27018 *0 *0 *0 *0 0 2|0 0 288m 835m 125m 0 testDB:0.0% 0 0|0 0|0 120b 2k 11 13:06:23
192.168.212.176:27018 *0 *0 *0 *0 0 2|0 0 80m 416m 48m 0 test:0.0% 0 0|0 0|0 120b 2k 10 13:06:23
192.168.212.177:27018 *0 *0 *0 *0 0 2|0 0 80m 398m 44m 0 local:0.0% 0 0|0 0|0 120b 2k 10 13:06:23
192.168.212.175:27018 *0 *0 *0 *0 0 1|0 0 288m 835m 125m 0 testDB:0.0% 0 0|0 0|0 62b 2k 11 13:06:24
192.168.212.176:27018 *0 *0 *0 *0 0 1|0 0 80m 416m 48m 0 test:0.0% 0 0|0 0|0 62b 2k 10 13:06:24
192.168.212.177:27018 *0 *0 *0 *0 0 1|0 0 80m 398m 44m 0 local:0.0% 0 0|0 0|0 62b 2k 10 13:06:24
After the insert of the 10.000 rows I noticed that the sharding is stopped.
sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
"clusterId" : ObjectId("55a50555fe726cefcf33b2f1")
}
shards:
{ "_id" : "shard0000", "host" : "192.168.212.175:27018" }
{ "_id" : "shard0001", "host" : "192.168.212.176:27018" }
{ "_id" : "shard0002", "host" : "192.168.212.177:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
{ "_id" : "testDB", "partitioned" : true, "primary" : "shard0000" }
{ "_id" : "db", "partitioned" : false, "primary" : "shard0001" }
Could anybody explain to me, why Shard2 and Shard3 filling with data for database testDB and why the sharding is stopped.
Kind Regards,
Arjan Kroon
Best Answer
In order for data to be distributed across multiple shards, you need to have multiple chunks for a given sharded collection. A chunk is a contiguous range of shard key values representing approximately 64 megabytes of data by default.
Chunk ranges are normally created automatically based on observation of the shard key values and size of documents being inserted through
mongos
. When documents are added to a chunk range which is approaching the maximum chunk size,mongos
will trigger a chunk split which will try to create smaller logical chunks for balancing.Currently all data in your sharded
testDB.testdata
collection is covered by a single chunk onshard0000
(aka yourShard1
server):Using
Object.bsonsize()
in themongo
shell to determine document size, your sample document in the question is about 43 bytes in BSON. With only 10,000 documents you would not be close to reaching the 64 megabytes of data required to trigger any chunk splits. Something closer to 1.5 million documents of that size would be required.You will also need to ensure the values for your chosen shard key have reasonable uniqueness. A poor shard key (i.e. with many common values) can lead to indivisible jumbo chunks that cannot be further split.
If you just want to test how sharding works you could: