MongoDb Join VS Nested Query

mongodbperformance

I'm trying to build a game server and we are using MongoDb as the database.

We have a collection Games that has models like this:

{
    "_id" : ObjectId("5d1b345b8ea742034db76431"),
    "Users" : [
        {
            "_id" : "0e76bd95-a7c2-4e15-b2bc-4cdf741fe9aa",
            "UserName" : "l98lNhLHPh",
            "Cards" : [
                249
            ]
        },
        {
            "_id" : "6ffec61d-45cc-46fa-a1f0-2a3e0a03fef6",
            "UserName" : "Vun12sWp4W",
            "Cards" : [
                234
            ]
        }
    ],
    "CreatedAt" : ISODate("2019-07-02T15:09:23.303+04:30"),
    "UpdatedAt" : ISODate("2019-07-02T15:09:23.674+04:30"),
    "IsFinished" : false
}

We can easily query users that participated in a game, but what if we need to know the games that a special user had played before?

We reached two solutions but we don't know which one is better.

First, query using nested field with an index on Users.UserName:
db.games.find({"Users.UserName":"Amin"})

Second, create a new collection that holds the User and Games data.

// collection: UserGames
{
    "_id" : ObjectId("5d1b35c58ea742034db79fea"),
    "UserId" : "6ffec61d-45cc-46fa-a1f0-2a3e0a03fef6",
    "GameId" : ObjectId("5d1b35c58ea742034db79fe9")
}

and join Games and UserGames collections to find the which user played which games.

I came from a Relational database mindset and I don't know which approach should I take?

Best Answer

The canonical answer is to create sample data using both schemas, and profile the queries you'll be doing to see which schema runs faster :-)

But I'm guessing you'll actually see equivalent performance. Mongo is pretty fast at querying subdocuments as long as you index the subfields (which you're doing in the first option).

So the real way to decide between the two approaches is to really understand your data and your query / access patterns. For example, one advantage of creating a join table is that you could move all the additional data (like 'UserName' and 'cards') into the join table. That shrinks your game records. This might be useful if you'll be having lots of game servers creating tons of game data in realtime, and where different users are hosted on different servers, which means updating user data as the game happens can lead to locking and contention issues if all of the user data is stored in one big game record. In that scenario, a join table makes better sense, because each user-game combination is a separate record which can be updated separately (and can even be sharded onto separate servers if your game becomes really popular :-) without needing to lock other records.

On the other hand, if you don't foresee needing to do a ton of writes and updates to user-game data in realtime, then keeping all of that data as subdocuments can make queries easier to write.

At the end of the day, it really does depend on what your access patterns will be like. The raw read performance will likely be equivalent, but that's only one factor in determining what's optimal in your scenario.