Mongodb – Where to store administrative (DBA) data

mongodb

I am a SQL Server DBA by trade, investigating MongoDB for a project. One of the conventions we use is to store configuration-style information in a SQLDBA database on each server (index rebuild thresholds, backup exceptions, etc).

I am wondering if this same technique would be useful here. My question is:

Should/could I use the admin or local database for this information, or should I create a new database to store essentially a single collection and document?
Would this change in a sharded/replica-set environment?

We have a central hub SQL server/database for sure, but we also store some configuration data locally on each server to drive certain processes. We wouldn't want these to fail because a server couldn't contact the hub.

Best Answer

I am wondering if this same technique would be useful here.

Since MongoDB is a distributed database, nodes tend to be part of a larger deployment (replica set or sharded cluster) where configuration differences in individual nodes are usually managed externally. For example, using Configuration Management software like Chef or Puppet to generate mongod and mongos configuration files.

Should/could I use the admin or local database for this information?

Each mongod node has a local database which is used for storing node-specific administrative or logging data such as the startup_log (a capped collection recording history of startup parameters and versions) and oplog.rs (the replication oplog).

You could store node-specific information in the local database, but I think it would make more sense to use a normal collection in your deployment in order to coordinate maintenance tasks and availability. Aside from backup, there aren't many node-specific maintenance tasks that should be performed without manual intervention.

or should I create a new database to store essentially a single collection and document?

If you do find the need for node-specific tasks I imagine your maintenance database would have more than a single collection and document. For example, you might have a configuration document per node and perhaps other collections such as a history of maintenance tasks. Capped collections can be useful for storing a rolling snapshot of "recent" activity using a preset storage allocation.

Would this change in a sharded/replica-set environment?

Assuming the details are node-specific, there would be no change in a sharded or replica set environment if you are accessing data in the local database. The contents of the local database are excluded from replication/sharding and you should only access the local database by connecting directly to a mongod node.

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

You can store your index as a list of fixed-size offsets into the block containing your key data. For example:

+--------------+
| 3            | number of entries
+--------------+
| 16           | offset of first key data
+--------------+
| 24           | offset of second key data
+--------------+
| 39           | offset of third key data
+--------------+
| key one |
+----------------+
| key number two |
+-----------------------+
| this is the third key |
+-----------------------+

(well, the key data would be sorted in a real example, but you get the idea).

Note that this does not necessarily reflect how index blocks are actually constructed in any database. This is merely an example of how you might organise a block of index data where the key data is of variable length.

Postgresql – Using MongoDB and PostgreSQL together

Some thoughts....

Typically one does not want to store pieces of tightly interrelated information in different systems. The chances of things getting out of sync is significant and now instead of one problem on your hands you have two. One thing you can do with Mongo though is use it to pipeline your data in or data out. My preference is to keep everything in PostgreSQL to the extent this is possible. However, I would note that doing so really requires expert knowledge of PostgreSQL programming and is not for shops unwilling to dedicate to using advanced features. I see a somewhat different set of options than you do. Since my preference is not something I see listed I will give it to you.

You can probably separate your metadata into common data, data required for classes, and document data. In this regard you would have a general catalog table with the basic common information plus one table per class. In this table you would have an hstore, json, or xml field which would store the rest of the data along with columns where you are storing data that must be constrained significantly. This would reduce what you need to put in these tables per class, but would allow you to leverage constraints however you like. The three options have different issues and are worth considering separately:

hstore is relatively limited but also used by a lot of people. It isn't extremely new but it only is a key/value store, and is incapable of nested data structures, unlike json and xml.

json is quite new and doesn't really do a lot right now. This doesn't mean you can't do a lot with it, but you aren't going to do a lot out of the box. If you do you can expect to do a significant amount of programming, probably in plv8js or, if you want to stick with older environments, plperlu or plpython. json is better supported in 9.3 though at least in current development snapshots, so when that version is released things will get better.

xml is the best supported of the three, with the most features, and the longest support history. Then again, it is XML.....

However if you do decide to go with Mongo and PostgreSQL together, note that PostgreSQL supports 2 phase commit meaning you can run the write operations, then issue PREPARE TRANSACTION and if this succeeds do your atomic writes in Mongo. If that succeeds you can then COMMIT in PostgreSQL.

Best Answer

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

Postgresql – Using MongoDB and PostgreSQL together

Related Question