Mongodb – Mongo zoned sharding with a single value

mongodbsharding

We have a database with around 50 collections, and 70 million records. All records belong to specific customers and have a customerId property. Currently all clients are in the USA. However, we're in the process of adding EU clients and would like to host their data in an EU datacenter.

Since all records have a customerId (which is a string) and virtually all of our queries also specify this customerId this makes it a good choice for a shard key for us – we're only going to have one shard per zone, and we'd like all of a customers data to be in that one shard.

My question is, given that the customerId's are strings, how do we specify a minimum and maximum for the sh.updateZoneKeyRange() function? Obiously minimum will be the customerId (e.g., "customer-somename") but how do we specify a maximum of "customer-somename" + 1? The issue here is that maximum is exclusive, so it can't be the same as minimum

Best Answer

Assuming your customerId does not have any obvious mapping to location, it would be best to add a location field and create a compound shard key. The location field should be a prefix of the shard key, otherwise you will have an administrative headache trying to maintain zones for small ranges of customerID values (which could end up as granular as a single customer).

With a compound shard key of {location: 1, customerId:1} your zone ranges would be straightforward to define.

For example:

sh.updateZoneKeyRange(
  "mydb.mycollection",
  { "location" : "US", "customerId" : MinKey },
  { "location" : "US", "customerId" : MaxKey },
  "US"
)

The MongoDB manual includes a similar example in Segmenting Data by Location and this is the same approach used for MongoDB Atlas' Global Clusters feature. Both of these examples suggest two-character country codes (ISO 3166-1 Alpha-2) for the location field, but you could choose any values which provide suitable granualarity for your future use cases. For example, ISO 3166-2 would provide for countries and subdivisions (administrative regions like state or province).

Starting in MongoDB 4.2, you can also change a document's shard key value if you need to move existing user data between zones. You presumably would never want to change your customerId values, but could update a location field.