Mongodb – How unique are MongoDB _id fields

mongodbschema

I am trying to create a schema for documents and collections for a MongoDB database I am creating. I have read a few documents describing the _id field that is generated by each document upon creation.

I have also read this article about implementing One-to-N relationships in Mongo and have decided that my case lies in the "One-to-Many" category. As a result I am going to reference multiple other documents in multiple other collections by their _id field from my "current" collection. From what I can tell in the article this seems that it will work just fine. Though I faintly remember reading an article saying that _id fields are not guaranteed to be unique over different collections, but they are within collections (forgive me, I cannot find a link to it).

So for example say I have three collections for a company that sells Apples and Oranges; Orders, Apples, and Oranges. Orders holds documents about orders for Apples and Oranges. Apples and Oranges holds data about various types of those fruits.

In a Orders documents I might have the fields

{
    ...
    "OrangesOrdered" :
    [
        { "Amount" : 2, "OrangeID" : "OrangeObjectID12"}
    ]
    ...
    "ApplesOrdered" :
    [
        { "Amount" : 4, "AppleID" : "AppleObjectID65"}
    ]
    ...
}

Where OrangeObjectID12 is the _id field of a specific type of Orange and AppleObjectID65 is the _id field of a specific type of Apple.

Is it possible that these two _id could ever be the same?

Best Answer

Strictly speaking, _id fields are as unique as you make them. As you have discovered, the default values for _id are ObjectIDs but you can populate the field with any data you wish.

Therefore, for example, you could use a UUID as I have done in this sharding example and then the chances of a collision across collections would essentially be whatever the collision chance is of your generation method and content. A crude version of this _id override can also be seen in this gist (something I used for testing GUIDs some time ago)

In terms of ObjectIDs themselves, the topic has been covered pretty well already (here and here for example) and can be inferred from the documentation you linked. Basically, there is a chance of collision, particularly for ObjectIDs generated by the same host within the same second at a high volume but for most use cases that is not a concern.

Finally, with respect to any database level guarantee of uniqueness across collections, there essentially isn't one. Uniqueness is enforced by the index, and there is no option for an index to span multiple collections, hence uniqueness is only guaranteed within a given collection.