MongoDB sharding by monotonic id with zones for archiving

archivemongodbsharding

Situation:

I have an ever-growing collection of documents which do have monotonically increasing unique id.
All queries are direct lookup for documents with specified _id (no range queries nor queries on other fields).
Queries on newer documents are more frequent than older documents.
The workload is both read and write heavy (newer data).

Goals:

  1. distributing reads/writes for the newest data onto multiple shards
  2. ability to move older data to nodes for archive with cheaper hardware

Considerations/options:

All queries use _id and cardinality is good (since values are unique) shard key should be _id.

  1. Use shard key {_id:1}
    • Problem is, of course, that shard (current) with newest data (biggest ids) will be hot since all new writes will hit it and most reads will also hit it.
    • A benefit with this setup is that I can periodically add a tag to new ranges, and then later (in a couple of months or a year) assign tag zone to shards on cheaper hardware so that whole range of data can be transparently moved onto archiving nodes.

Simple outline of shards

+---------+    +------------+    +-------------+
|HDD      |    |HDD         |    |SSD          |
|archive1 |    |archive2    |    |current      |
|id:[0,99]|    |id:[100,199]|    |id:[200,inf] |
+---------+    +------------+    +-------------+

After time goes by, new machine is added and ranges are changed and data is moved

                                   new machine
+----------+    +------------+    +-------------+    +------------+
|HDD       |    |HDD         |    |HDD          |    |SSD         |
|archive1  |    |archive2    |    |archive3     |    |current     |
|id:[0,100]|    |id:[100,200]|    |id:[200,300] |    |id:[300,inf]|
+----------+    +------------+    +-------------+    +------------+
  1. Use shard key {_id:hashed}
    • A benefit of this kind of sharding is that writes and reads are evenly distributed
    • Problem is that now it's not possible to add a tag to a range of ids which would be assigned to an archive server. Cause of this problem is that range tag is not applied to original id, but rather on hashed value of id, as stated in documentation Hashed Shard Keys and Zones

Outline of evenly distributed current shards (assume equal ranges, -100 and 100 is just placeholder)

+--------------------+
|SSD                 |
|current1            |
|hash(id):[-inf,-100]|
+--------------------+
+--------------------+
|SSD                 |
|current2            |
|hash(id):[-100,100] |
+--------------------+
+--------------------+
|SSD                 |
|current3            |
|hash(id):[100,inf]  |
+--------------------+
  1. Shard key which is some combination of compound key

    • I was thinking that if I could make a benefit from compound key but can't figure out how to achieve the desired behavior
    • I was considering to use shard key something like {id:1,hashedId:1} where hashedId would be hash(id) calculated at client side

Desired outline:

A goal is to have shards with HDD which would store older data and shards with SSD which would handle most read and write operations.

+----------+  .  +------------+  .  +--------------------+
|HDD       |  .  |HDD         |  .  |SSD                 |
|archive1  |  .  |archive2    |  .  |current1            |
|id:[0,100]|  .  |id:[100,200]|  .  |hash(id):[-inf,-100]|
+----------+  .  +------------+  .  +--------------------+
              .                  .  +--------------------+
              .                  .  |SSD                 |
              .                  .  |current2            |
              .                  .  |hash(id):[-100,100] |
              .                  .  +--------------------+
              .                  .  +--------------------+
              .                  .  |SSD                 |
              .                  .  |current3            |
              .                  .  |hash(id):[100,inf]  |
              .                  .  +--------------------+
              .                  .
 id:[0,100]   .   id:[100,200]   .       id:[200,inf]

My current workaround

Now, I have manually written a wrapper around mongo client at application level which routes requests to appropriate mongo server and thus simulating this behavior. Currently, those servers are not connected into a cluster. Problem with the current workaround is operational complexity when changes are made (change routing config, restart the application(s), …). This is the reason I would move that logic to the database level.

Question

What would be a way to choose a shard key to achieve the desired architecture?
Or, in general, when using monotonic id, how to achieve architecture so that new data is written to evenly distributed among set of shards and as time goes by and data gets older, to be able to move this data to archive nodes initiated by DB admin and all of that transparent to the application.

Best Answer

Unfortunately with a fully monotonic shard key, you cannot remove hot shards entirely. However, you can somewhat mitigate its effect by inserting into a faster shard. You would need to determine if a hot shard or archival using document age is more important for you, since they are pretty much mutually exclusive if you have to use a single field as the shard key.

Another solution is to use a compound shard key, but this will complicate your queries.

From your description, if you need to have only _id as the shard key, one possible workaround is to use zone sharding. Zone sharding allows you to assign shards to specific "zones" based on the shard key. The balancer will then automatically move chunks as required so that the chunks with the specific shard key will reside in some specific shard (or shards).

To illustrate, suppose you use {_id: 1} as your shard key, and you have 4 shards of to store archive and current document:

  1. Disable the balancer.

  2. Create zone tags according to the zones you need and the shards you have, e.g.:

    sh.addShardTag("shard0000", "archive1")
    sh.addShardTag("shard0001", "archive2")
    sh.addShardTag("shard0002", "current")
    sh.addShardTag("shard0003", "current")
    

    Note that you can assign multiple zones into a shard, and multiple shards into a zone.

  3. Determine that _id:MinKey to _id:200 should be located in archive1 and archive2:

    sh.addTagRange("db.coll", { _id: MinKey }, { _id: 100 }, "archive1")
    sh.addTagRange("db.coll", { _id: 100 }, { _id: 200 }, "archive2")
    
  4. Determine that _id:200 to _id:MaxKey should be located in current:

    sh.addTagRange("db.coll", { _id: 200 }, { _id: MaxKey }, "current")
    
  5. Enable the balancer.

The balancer will split and rebalance the collection according to the zones you specify. Note that this would take some time & resources if you have a non-empty collection.

At some point, you might want to change the boundaries of the zones. For this purpose, the manual page Tiered Hardware for Varying SLA or SLO contains a description of using zone sharding in a situation similar to what you described, under the section Updating zone ranges.