MongoDB – handling embedded documents and relations

embeddedmongodb

I am researching MongoDB for a web app that I am bulding. Coming from a MySQL background, the concept of embedded documents is not so easy to fully understand.

Let's say I have a document called blogpost and it looks something like this:

db.posts.save({
    _id: 1
    title: "first post!",
    body: "post content",
    author: {
        _id: 1,
        name: 'John'
        email: 'jonh@doe.com'
    },
    comments: [
        {
            _id: 20,
            author: 'mary',
            content: 'This blog post is cool!'
        }
    ]
});

Each author would actually be stored in the authors collection, when I save the blogpost, I would merely copy the data from the author document and paste it so it is embedded in the blogpost document. Is this a good way to do that?

My concern is that when John updates his e-mail address, it would only be updated in the authors collection. Some of his older blogposts would then show an outdated e-mail address for him.

Does MongoDB have a method for dealing with that issue, or would I need to do it myself in my app code?

If I do that in my app code then what is the point of embedding the author into the blogpost in the first place? I could just store the reference, author _id, and look up the author in a separate query.

On the other hand, if I need to store historical data, for example an invoice with customer information, then it would make sense to embed the customer document inside the invoice document since invoices need to show the customer data that existed when the invoice was created.

For the comments part, I have already read about Multiple Collections vs Embedded Documents and when it comes to comments, Multiple Collections seems to be the way to go. http://mongly.com/Multiple-Collections-Versus-Embedded-Documents/

So, in general – am I completely missing the point of embedded documents?

Best Answer

You are not missing the point :)

The key is how many reads and how many writes do you have to do when you store/update/fetch various entities in your collections?

You are presumably constantly (daily?) creating new posts and frequently creating new comments, and very frequently querying posts.

Actions like updating authors' email are very rare compared to the above. You want to optimize the performance of reads and writes that happen hundreds or thousands of times per in your application, and worry less about performance of very infrequent actions.

Having said that, I would store the basic author information embedded in the post, but I would only have information there that must be displayed with every post. If there are additional details about the author I want to keep in its own collection, I would expect that would be details that the user needs to click additional link to see (allowing for time to do another read).