Mongodb – Solution for a readonly copy of a standalone MongoDB server

mongodbmongodb-4.0read-only-databasereplication

I store some data in many databases/collections on a standalone MongoDB server. This server is used for production.

I need to access those data (or at least nightly build data, and not all databases/collections), for data analysis purpose : readonly, possibly high consumption queries. I don't want to operate data analysis on the production server.

For now, I use mongoexport and mongoimport to export and re-import the data into another standalone server. This has many drawbacks :

  • indexes are not kept,
  • night export/re-import is a long-time process that may fail,
  • data is not continuously synced (not absolutely necessary).

I studied the MongoDb replica set documentation, but some questions are still unanswered to me :

  • Can I choose which databases/collections are replicated ?
  • Is a 2-members (a primary and a secondary) replica set eligible ?
  • Can I make our production application servers "always" read-write on the primary, and our data-analysis server "always" read on the secondary ?

I know I have many questions and it might break Stackoverflow Q/A principles, but I hope the question label could be useful for many others.

Best Answer

Can I choose which databases/collections are replicated ?

A replica set is a group of mongodb instances that maintain the same data set, so it is necessary for the entire set of data to be replicated between nodes.

Is a 2-members (a primary and a secondary) replica set eligible ?

A replica set should have an odd number of voting members. If you want to maintain only two data-bearing nodes, you would want to include an arbiter in the replica set as a third voting member.

Can I make our production application servers "always" read-write on the primary, and our data-analysis server "always" read on the secondary ?

All writes must be directed to the Primary node of the replica set; clients are not allowed to write to a data bearing node in the 'Secondary' replica set state.

Specifying a read preference is a good way of directing queries to different nodes of a replica set.