Mongodb – Zoned and hashed sharding of existing, non-empty collection

mongodbsharding

I am trying to shard a existing collection. Collection is pretty big (about 100gb). Sharding should be done by 2 fields. First one is zone ['Europe', 'America'] and second one is _id.
So my sharding index looks like this {"zone": 1, "_id": "hashed"}.

I have followed these steps https://docs.mongodb.com/manual/reference/method/sh.updateZoneKeyRange/#compound-hashed-shard-key-with-non-prefix-hashed-field. This works only on empty or non existing collection. My question is is this type of sharding possible on collection that is not empty.

I am using mongo version 4.4

Best Answer

For me it is working. You only need to create the index { "region": 1, "_id": "hashed" } manually.

use data
db.collection.insertMany([{ region: 'EU' }, { region: 'US' }])

sh.addShardToZone("shard_01", "Europe")
sh.addShardToZone("shard_02", "America")

sh.updateZoneKeyRange(
   "data.collection",
   { "region": "EU", "_id": MinKey },
   { "region": "EU", "_id": MaxKey },
   "Europe"
)

sh.updateZoneKeyRange(
   "data.collection",
   { "region": "US", "_id": MinKey },
   { "region": "US", "_id": MaxKey },
   "America"
)

db.collection.createIndex({ "region": 1, "_id": "hashed" })
sh.shardCollection(
   "data.collection",
   { "region": 1, "_id": "hashed" }
)

Extract from sh.status():

data.collection
  shard key: { "region" : 1, "_id" : "hashed" }
  unique: false
  balancing: true
  chunks:
          shard_01  2
          shard_02  1
          shard_03  1
          shard_04  1
  { "region" : { "$minKey" : 1 }, "_id" : { "$minKey" : 1 } } -->> { "region" : "EU", "_id" : { "$minKey" : 1 } } on : shard_04 Timestamp(3, 0) 
  { "region" : "EU", "_id" : { "$minKey" : 1 } } -->> { "region" : "EU", "_id" : { "$maxKey" : 1 } } on : shard_01 Timestamp(4, 1) 
  { "region" : "EU", "_id" : { "$maxKey" : 1 } } -->> { "region" : "US", "_id" : { "$minKey" : 1 } } on : shard_03 Timestamp(4, 0) 
  { "region" : "US", "_id" : { "$minKey" : 1 } } -->> { "region" : "US", "_id" : { "$maxKey" : 1 } } on : shard_02 Timestamp(2, 0) 
  { "region" : "US", "_id" : { "$maxKey" : 1 } } -->> { "region" : { "$maxKey" : 1 }, "_id" : { "$maxKey" : 1 } } on : shard_01 Timestamp(1, 5) 
   tag: Europe  { "region" : "EU", "_id" : { "$minKey" : 1 } } -->> { "region" : "EU", "_id" : { "$maxKey" : 1 } }
   tag: America  { "region" : "US", "_id" : { "$minKey" : 1 } } -->> { "region" : "US", "_id" : { "$maxKey" : 1 } }

Please be aware of Balancing failed when moving big collection, my collection is also around 100GB, so you may face the same issue.