MongoDB Using String-Type Shard Key for Zone Ranges

mongodbshardingspatial

Mongo lets any field type be used to set ranges for Shard Zones, but doesn't specify how analyses are conducted.

My goal is to point queries and writes at the exact right server, WITHOUT using a compound shard key (for various execution complexities).

I want to embed a unique user alpha-numeric ID, plus a latitude and longitude value into a single string, and use this as my shard key.

Ex: (ID.lat.lon) "4kjaj29.48.-89"

Would fall within the range:

"0000000.45.-100" to "zzzzzzz.50.-80"

Mongo uses utf8 strings, https://www.utf8-chartable.de/unicode-utf8-table.pl, in which for alpha-numeric characters 0 is the "least" and z is the "greatest".

So I have to imagine this is how range inclusion is computed? Does anyone know off-hand if this approach is correct? And if not, can you point me in the right direction to accomplish this?

There is definitely a way to accomplish this, since some kind of range inclusions calculation is occurring with string types, but just requires knowing the analysis strategy.

Best Answer

As per MongoDB documentation here In sharded clusters, you can create zones of sharded data based on the shard key. You can associate each zone with one or more shards in the cluster. A shard can associate with any number of zones. In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.

Some common deployment patterns where zones can be applied are as follows:

Isolate a specific subset of data on a specific set of shards. Ensure that the most relevant data reside on shards that are geographically closest to the application servers. Route data to shards based on the hardware / performance of the shard hardware.

For example

enter image description here

In the above mention figure a sharded cluster with three shards and two zones. The A zone represents a range with a lower boundary of 1 and an upper bound of 10. The B zone represents a range with a lower boundary of 10 and an upper boundary of 20. Shards Alpha and Beta have the A zone. Shard Beta also has the B zone. Shard Charlie has no zones associated with it. The cluster is in a steady state and no chunks violate any of the zones.

After configuring a zone with a shard key range and associating it with a shard or shards, the cluster may take some time to migrate the affected data. This depends on the division of chunks and the current distribution of data in the cluster. When balancing is complete, reads and writes for documents in a given zone are routed only to the shard or shards inside that zone.

Shard Key

You must use fields contained in the shard key when defining a new range for a zone to cover. If using a compound shard key, the range must include the prefix of the shard key.

For example, given a shard key { a : 1, b : 2, c : 3 }, creating or updating a zone to cover values of b requires including a as the prefix. Creating or updating a zone to covers values of c requires including a and b as the prefix.

You cannot create zones using fields not included in the shard key. For example, if you wanted to use zones to partition data based on geographic location, the shard key would need at least one field that contained geographic data.

When choosing a shard key for a collection, consider what fields you might want to use for configuring zones. After sharding, you cannot change the shard key. See Choosing a Shard Key for considerations in choosing a shard key.

Hashed Shard Keys and Zones

When using zones on a hashed shard key, each zone covers the hashed shard key values. Given a shard key { a : 1 } and a zone alpha with a lower bound of 1 and an upper bound of 5, the bounds represent the hashed value of a, and not the actual value. Therefore, there is no guarantee that MongoDB routes documents where a has a value of 1 to 5 to zone alpha. MongoDB routes any document where the hashed shard key value falls within the range of 1 or 5 to a shard inside zone alpha.

In general, a zone covering a sequential range of hashed shard key values may exhibit unexpected behavior.

It is possible create a zone which covers the entire range of shard key values using minkey and maxkey to guarantee that MongoDB restricts all the data for a specific collection to the shard or shards in that zone.

For your further ref here, here, here and here