MongoDB 3.x – Data Structure for Geospatial Indexes

indexmongodb

For the MongoDB 3.x geospatial features what structure is used for the indexes? B-tree? R-tree? X-tree? I can't find any information about the structures in their documentation

Best Answer

The general index format used by MongoDB's included storage engines as at 3.0.x (MMAPv1 and WiredTiger) is B-tree, however there are more nuances in the technical implementation.

MongoDB 3.0 introduced a storage engine API which separates the concerns of storage formats (i.e. data & index representations on disk and in memory) from the core server product. For example, WiredTiger supports index prefix compression, data compression, and more granular concurrency than MMAPv1.

It is expected that alternative storage engine implementations can (and will) differ in their underlying implementation of indexes and data storage to suit different workloads. For example, WiredTiger has support for LSM (Log Structured Merge-Trees) which is expected to be available in the MongoDB 3.2 production release. There are also alternative storage engines such as RocksDB (which uses LSM) and TokuMXse (which uses Tokutek's fractal tree storage).

I'm not aware of any pluggable storage engines for MongoDB that have been specifically optimized for storage of spatial data, but it is conceivable that one may be created.

Related Solutions

What structure is used for storing CONTEXT, CTXCAT and CTXRULE Oracle text indexes

The documentation covers this thoroughly.

See this link for more detail.

MongoDB – Searching for Array Elements Nested in Documents

Remember, MongoDB has a dynamic schema. So it is perfectly ok to store this document:

{
  "JobNumber" : "50001-01",
  "CustomerId" : "joe",
  "IdentifierNumber" : NumberLong(8812739),
  "TimesPrinted" : 0,
  "Packaging" : {"bundle":1200,"box":120,"pallet":3}
}

and this document

{
  "JobNumber" : "50001-02",
  "CustomerId" : "jane",
  "IdentifierNumber" : NumberLong(8812739),
  "TimesPrinted" : 0,
  "Packaging" : {"sack":200}
}

in the same collection.

Since, I wouldn't query for the Nth document, but for a given field in the subdocument, for example

 db.collection.find({"packaging.bundle":1200})

which would run just fine with MongoDB. The reason behind that is that if a field isn't present in a document, it is evaluated as null for a query. And null is definitely not equal to 1200.

As for the performance. It really depends on who big your collection is and how your queries look like. While the query as shown above may be rather slow in a collection containing hundred of thousands of documents (or even more) without an index, it can be extremely fast when you created an index on it, e.g.

    db.collection.ensureIndex({"packaging.bundle":1,"packaging.box":1,"packaging.pallet":1});

If you can create an index like this obviously depends on the question wether you really have arbitrary packaging or if you simply have a variety of packaging options. If the latter is the case, I'd create an index for each of the packaging options, utilizing sparse indices, e.g.

 db.collection.ensureIndex({"packaging.sack":1},{sparse:true})

This would reduce the index size, as only documents which hold the field "packaging.sack" would be contained in this index.

If you really have arbitrary fields in the documents, I wonder how you create a model for it ;)

When talking of just some ten thousands of documents, you might even get satisfying result without an index.

Best Answer

Related Solutions

What structure is used for storing CONTEXT, CTXCAT and CTXRULE Oracle text indexes

MongoDB – Searching for Array Elements Nested in Documents

Related Question