MongoDb Join VS Nested Query

mongodbperformance

I'm trying to build a game server and we are using MongoDb as the database.

We have a collection Games that has models like this:

{
    "_id" : ObjectId("5d1b345b8ea742034db76431"),
    "Users" : [
        {
            "_id" : "0e76bd95-a7c2-4e15-b2bc-4cdf741fe9aa",
            "UserName" : "l98lNhLHPh",
            "Cards" : [
                249
            ]
        },
        {
            "_id" : "6ffec61d-45cc-46fa-a1f0-2a3e0a03fef6",
            "UserName" : "Vun12sWp4W",
            "Cards" : [
                234
            ]
        }
    ],
    "CreatedAt" : ISODate("2019-07-02T15:09:23.303+04:30"),
    "UpdatedAt" : ISODate("2019-07-02T15:09:23.674+04:30"),
    "IsFinished" : false
}

We can easily query users that participated in a game, but what if we need to know the games that a special user had played before?

We reached two solutions but we don't know which one is better.

First, query using nested field with an index on Users.UserName:
db.games.find({"Users.UserName":"Amin"})

Second, create a new collection that holds the User and Games data.

// collection: UserGames
{
    "_id" : ObjectId("5d1b35c58ea742034db79fea"),
    "UserId" : "6ffec61d-45cc-46fa-a1f0-2a3e0a03fef6",
    "GameId" : ObjectId("5d1b35c58ea742034db79fe9")
}

and join Games and UserGames collections to find the which user played which games.

I came from a Relational database mindset and I don't know which approach should I take?

Best Answer

The canonical answer is to create sample data using both schemas, and profile the queries you'll be doing to see which schema runs faster :-)

But I'm guessing you'll actually see equivalent performance. Mongo is pretty fast at querying subdocuments as long as you index the subfields (which you're doing in the first option).

So the real way to decide between the two approaches is to really understand your data and your query / access patterns. For example, one advantage of creating a join table is that you could move all the additional data (like 'UserName' and 'cards') into the join table. That shrinks your game records. This might be useful if you'll be having lots of game servers creating tons of game data in realtime, and where different users are hosted on different servers, which means updating user data as the game happens can lead to locking and contention issues if all of the user data is stored in one big game record. In that scenario, a join table makes better sense, because each user-game combination is a separate record which can be updated separately (and can even be sharded onto separate servers if your game becomes really popular :-) without needing to lock other records.

On the other hand, if you don't foresee needing to do a ton of writes and updates to user-game data in realtime, then keeping all of that data as subdocuments can make queries easier to write.

At the end of the day, it really does depend on what your access patterns will be like. The raw read performance will likely be equivalent, but that's only one factor in determining what's optimal in your scenario.

Related Solutions

Mongodb – Best embed/reference/field strategy for many inserts (mongo)

Why do you have Score model? Why not keep only User and Match models and for each have a score attribute? In this schema you have only the updates per score. If its due to redundancy of attributes, please note this is common in a document based schema design.

Mongodb – Searching for array elements nested in MongoDB Documents

Remember, MongoDB has a dynamic schema. So it is perfectly ok to store this document:

{
  "JobNumber" : "50001-01",
  "CustomerId" : "joe",
  "IdentifierNumber" : NumberLong(8812739),
  "TimesPrinted" : 0,
  "Packaging" : {"bundle":1200,"box":120,"pallet":3}
}

and this document

{
  "JobNumber" : "50001-02",
  "CustomerId" : "jane",
  "IdentifierNumber" : NumberLong(8812739),
  "TimesPrinted" : 0,
  "Packaging" : {"sack":200}
}

in the same collection.

Since, I wouldn't query for the Nth document, but for a given field in the subdocument, for example

 db.collection.find({"packaging.bundle":1200})

which would run just fine with MongoDB. The reason behind that is that if a field isn't present in a document, it is evaluated as null for a query. And null is definitely not equal to 1200.

As for the performance. It really depends on who big your collection is and how your queries look like. While the query as shown above may be rather slow in a collection containing hundred of thousands of documents (or even more) without an index, it can be extremely fast when you created an index on it, e.g.

    db.collection.ensureIndex({"packaging.bundle":1,"packaging.box":1,"packaging.pallet":1});

If you can create an index like this obviously depends on the question wether you really have arbitrary packaging or if you simply have a variety of packaging options. If the latter is the case, I'd create an index for each of the packaging options, utilizing sparse indices, e.g.

 db.collection.ensureIndex({"packaging.sack":1},{sparse:true})

This would reduce the index size, as only documents which hold the field "packaging.sack" would be contained in this index.

If you really have arbitrary fields in the documents, I wonder how you create a model for it ;)

When talking of just some ten thousands of documents, you might even get satisfying result without an index.

Best Answer

Related Solutions

Mongodb – Best embed/reference/field strategy for many inserts (mongo)

Mongodb – Searching for array elements nested in MongoDB Documents

Related Question