I am trying to create a schema for documents and collections for a MongoDB database I am creating. I have read a few documents describing the _id
field that is generated by each document upon creation.
I have also read this article about implementing One-to-N relationships in Mongo and have decided that my case lies in the "One-to-Many" category. As a result I am going to reference multiple other documents in multiple other collections by their _id
field from my "current" collection. From what I can tell in the article this seems that it will work just fine. Though I faintly remember reading an article saying that _id
fields are not guaranteed to be unique over different collections, but they are within collections (forgive me, I cannot find a link to it).
So for example say I have three collections for a company that sells Apples and Oranges; Orders
, Apples
, and Oranges
. Orders
holds documents about orders for Apples and Oranges. Apples
and Oranges
holds data about various types of those fruits.
In a Orders
documents I might have the fields
{
...
"OrangesOrdered" :
[
{ "Amount" : 2, "OrangeID" : "OrangeObjectID12"}
]
...
"ApplesOrdered" :
[
{ "Amount" : 4, "AppleID" : "AppleObjectID65"}
]
...
}
Where OrangeObjectID12
is the _id
field of a specific type of Orange and AppleObjectID65
is the _id
field of a specific type of Apple.
Is it possible that these two _id
could ever be the same?
Best Answer
Strictly speaking,
_id
fields are as unique as you make them. As you have discovered, the default values for_id
areObjectIDs
but you can populate the field with any data you wish.Therefore, for example, you could use a UUID as I have done in this sharding example and then the chances of a collision across collections would essentially be whatever the collision chance is of your generation method and content. A crude version of this
_id
override can also be seen in this gist (something I used for testingGUIDs
some time ago)In terms of
ObjectIDs
themselves, the topic has been covered pretty well already (here and here for example) and can be inferred from the documentation you linked. Basically, there is a chance of collision, particularly forObjectIDs
generated by the same host within the same second at a high volume but for most use cases that is not a concern.Finally, with respect to any database level guarantee of uniqueness across collections, there essentially isn't one. Uniqueness is enforced by the index, and there is no option for an index to span multiple collections, hence uniqueness is only guaranteed within a given collection.