Some thoughts....
Typically one does not want to store pieces of tightly interrelated information in different systems. The chances of things getting out of sync is significant and now instead of one problem on your hands you have two. One thing you can do with Mongo though is use it to pipeline your data in or data out. My preference is to keep everything in PostgreSQL to the extent this is possible. However, I would note that doing so really requires expert knowledge of PostgreSQL programming and is not for shops unwilling to dedicate to using advanced features. I see a somewhat different set of options than you do. Since my preference is not something I see listed I will give it to you.
You can probably separate your metadata into common data, data required for classes, and document data. In this regard you would have a general catalog table with the basic common information plus one table per class. In this table you would have an hstore, json, or xml field which would store the rest of the data along with columns where you are storing data that must be constrained significantly. This would reduce what you need to put in these tables per class, but would allow you to leverage constraints however you like. The three options have different issues and are worth considering separately:
hstore is relatively limited but also used by a lot of people. It isn't extremely new but it only is a key/value store, and is incapable of nested data structures, unlike json and xml.
json is quite new and doesn't really do a lot right now. This doesn't mean you can't do a lot with it, but you aren't going to do a lot out of the box. If you do you can expect to do a significant amount of programming, probably in plv8js or, if you want to stick with older environments, plperlu or plpython. json
is better supported in 9.3 though at least in current development snapshots, so when that version is released things will get better.
xml is the best supported of the three, with the most features, and the longest support history. Then again, it is XML.....
However if you do decide to go with Mongo and PostgreSQL together, note that PostgreSQL supports 2 phase commit meaning you can run the write operations, then issue PREPARE TRANSACTION
and if this succeeds do your atomic writes in Mongo. If that succeeds you can then COMMIT
in PostgreSQL.
Yes, if the database is open in read-write mode there is always a number of changes to database that reside only in the current redo log and not in any archived log. If you want protect the current redo against hardware-related corruption, you need to add redundant storage (with either a remote replication or local mirror). If you want to protect the current redo from OS-related corruption, you need a "physical standby" database (the marketing term is DataGuard) with "SYNC LGWR" feed.
RMAN would not be useful to you, it is not designed for a a rolling backup of current redo.
Best Answer
I think the problem you have is that the archive format created with
--archive
is not a tarball (and the docs don't say it is anywhere that I could find). Rather it is a custom packaging format which you can see the details of here. Based on a quick scan of the code, it looks like a lightweight format containing a series of headers, metadata which describes the raw BSON. Short of creating a standalone binary to create a compatible archive file, you can't do this manually.If you do
mongodump
without archive, but with--gzip
, it will actually gzip the individual files, and you can emulate that by doing the dump normally then gzipping each file in the folder separately. Those compressed files could then be restored theoretically withmongorestore --gzip
.Overall I would advise to just use
--archive
in the 3.2 tools and stay away from trying to recreate it manually, but the--gzip
option is as close as it is going to get without a bunch of work.