Remember, MongoDB has a dynamic schema. So it is perfectly ok to store this document:
{
"JobNumber" : "50001-01",
"CustomerId" : "joe",
"IdentifierNumber" : NumberLong(8812739),
"TimesPrinted" : 0,
"Packaging" : {"bundle":1200,"box":120,"pallet":3}
}
and this document
{
"JobNumber" : "50001-02",
"CustomerId" : "jane",
"IdentifierNumber" : NumberLong(8812739),
"TimesPrinted" : 0,
"Packaging" : {"sack":200}
}
in the same collection.
Since, I wouldn't query for the Nth document, but for a given field in the subdocument, for example
db.collection.find({"packaging.bundle":1200})
which would run just fine with MongoDB. The reason behind that is that if a field isn't present in a document, it is evaluated as null
for a query. And null
is definitely not equal to 1200.
As for the performance. It really depends on who big your collection is and how your queries look like. While the query as shown above may be rather slow in a collection containing hundred of thousands of documents (or even more) without an index, it can be extremely fast when you created an index on it, e.g.
db.collection.ensureIndex({"packaging.bundle":1,"packaging.box":1,"packaging.pallet":1});
If you can create an index like this obviously depends on the question wether you really have arbitrary packaging or if you simply have a variety of packaging options. If the latter is the case, I'd create an index for each of the packaging options, utilizing sparse indices, e.g.
db.collection.ensureIndex({"packaging.sack":1},{sparse:true})
This would reduce the index size, as only documents which hold the field "packaging.sack" would be contained in this index.
If you really have arbitrary fields in the documents, I wonder how you create a model for it ;)
When talking of just some ten thousands of documents, you might even get satisfying result without an index.
Best Answer
The general index format used by MongoDB's included storage engines as at 3.0.x (MMAPv1 and WiredTiger) is B-tree, however there are more nuances in the technical implementation.
MongoDB 3.0 introduced a storage engine API which separates the concerns of storage formats (i.e. data & index representations on disk and in memory) from the core server product. For example, WiredTiger supports index prefix compression, data compression, and more granular concurrency than MMAPv1.
It is expected that alternative storage engine implementations can (and will) differ in their underlying implementation of indexes and data storage to suit different workloads. For example, WiredTiger has support for LSM (Log Structured Merge-Trees) which is expected to be available in the MongoDB 3.2 production release. There are also alternative storage engines such as RocksDB (which uses LSM) and TokuMXse (which uses Tokutek's fractal tree storage).
I'm not aware of any pluggable storage engines for MongoDB that have been specifically optimized for storage of spatial data, but it is conceivable that one may be created.