Mongodb – Searching for array elements nested in MongoDB Documents

mongodbperformance

I need to store card documents in my database. Cards need to be stored with a set of integers that represent the bundle, box and pallet, or whatever other entity they belong to. The packaging structure needs to be of an arbitrary length to accommodate flexible packing structures (otherwise they would each have their own field in the document.)

I am considering a document structure like this, where the packaging numbers are kept as an array in the Packaging field.

{
"JobNumber" : "50001-01",
"CustomerId" : "joe",
"IdentifierNumber" : NumberLong(8812739),
"TimesPrinted" : 0,
"Packaging" : [ 
    1200, 
    120, 
    3
]
}

The Packaging field will be set once, then searched against many times.

I am wondering about performance because I will often need to query this collection for all documents whose nth element in the Packaging field is a given value. Is this something that MongoDB is good at? What are the indexes I should be considering?

Best Answer

Remember, MongoDB has a dynamic schema. So it is perfectly ok to store this document:

{
  "JobNumber" : "50001-01",
  "CustomerId" : "joe",
  "IdentifierNumber" : NumberLong(8812739),
  "TimesPrinted" : 0,
  "Packaging" : {"bundle":1200,"box":120,"pallet":3}
}

and this document

{
  "JobNumber" : "50001-02",
  "CustomerId" : "jane",
  "IdentifierNumber" : NumberLong(8812739),
  "TimesPrinted" : 0,
  "Packaging" : {"sack":200}
}

in the same collection.

Since, I wouldn't query for the Nth document, but for a given field in the subdocument, for example

 db.collection.find({"packaging.bundle":1200})

which would run just fine with MongoDB. The reason behind that is that if a field isn't present in a document, it is evaluated as null for a query. And null is definitely not equal to 1200.

As for the performance. It really depends on who big your collection is and how your queries look like. While the query as shown above may be rather slow in a collection containing hundred of thousands of documents (or even more) without an index, it can be extremely fast when you created an index on it, e.g.

    db.collection.ensureIndex({"packaging.bundle":1,"packaging.box":1,"packaging.pallet":1});

If you can create an index like this obviously depends on the question wether you really have arbitrary packaging or if you simply have a variety of packaging options. If the latter is the case, I'd create an index for each of the packaging options, utilizing sparse indices, e.g.

 db.collection.ensureIndex({"packaging.sack":1},{sparse:true})

This would reduce the index size, as only documents which hold the field "packaging.sack" would be contained in this index.

If you really have arbitrary fields in the documents, I wonder how you create a model for it ;)

When talking of just some ten thousands of documents, you might even get satisfying result without an index.