MongoDB – How to Choose a Randomly Distributed Shard Key

mongodbsharding

In one of my readings i have noticed following statement for choosing randomly distributed shard keys but not able to understand why is it said so. Could someone provide me with detailed explanation on this.

"The only downside to randomly distributed shard keys is that MongoDB isn’t efficient at randomly accessing data beyond the size of RAM."

Thank you.

Best Answer

While it's hard to say for certain without full context, I'm assuming its referring to the need to keep the working set in memory.

A randomly distributed shard key would distribute the workload across an entire index, meaning that the entire index would need to be fit in memory to efficiently handle the workload. Performance would deteriorate once the size of this index on a shard grows larger than RAM, as the index on the shard key would need to be page faulting data in and out of memory.

In contrast, a non-random shard key may have a "hot" subset that handles most of the working set. For example, consider a website where only newer "posts" by users are frequently accessed and older "posts" are rarely accessed. While the indexes on "posts" may be larger than available memory, only subsets of the indexes may need to fit in memory, reducing memory pressure and the potential of page faults.