MongoDB Hashed Shard Keys Selection

mongodbsharding

I have a DB which holds very simple documents called "Item":

{_id:<ObjectId>, playerId:int, isRead:bool, content:string}

containing about 80M documents. Each Item belongs to a certain player, defined by a monotonically incrementing playerId. Each player has up to 1,000 documents in the collection.

I want to shard the collection based on hashed playerId, since the read queries are based on playerId. This way, I believe I will be able to get to the right node in the cluster efficiently, then get the relevant documents (worst case, 1000) and retrieve them.

I was advised not to use hashed playerId because it's not unique, and might retrieve a large amount of documents per query, since isRead is not included in the index. Queries that retrieve by playerID and isRead might impose an efficiency issue.

Is this indeed a problem? Are there any other considerations to think of?

Best Answer

Why not shard on a custom _id value since it's guaranteed to be unique? With the size around 1,000 an index on playerId should allow for good performance as well.

Still do some testing of course because it can still vary based on number of shards and overall size of the collection.


Found a video I was looking for at the time from the Mongo University M102 course: Cardinality & Monotonic Shard Keys, which provides some more insight.