MongoDB how to achieve asynchronous geo-replication

mongodbmulti-masterPROXYreplicationsharding

I have read a lot of background about MongoDB replication and sharding. None of the default solutions that I know can actually do what I'm trying to achieve.

Current situation

We have a time critical application balanced over three geographically seperated locations (called A, B and C) that generates reports as JSON documents and writes them on a centralized (external) MongoDB server. We have a different application that reads and processes these reports from the centralized MongoDB server. This works pretty well.

  Application (A) ---->  +---------------------+
                         |                     |
  Application (B) ---->  | Centralized MongoDB | --> Processing
                         |                     |
  Application (C) ---->  +---------------------+

The problem with our current setup is that if the centralized MongoDB server is unresponsive for some reason, the reports are lost and the application gets a nice performance penalty when it's waiting for a response. So basically three independant redundant locations are all relying on a single location for report storage. We are not able to fix this problem in the application.

Desired situation

We would like to have a local MongoDB service listening in each geographically seperated location, so that our application is always able to delegate the reports locally. But eventually we still want to have all reports from these three locations merged on the centralized MongoDB server.

                       +---------------+       +-------------+
  Application (A) -->  | Local MongoDB | ----> |             |
                       +---------------+       |             |
                       +---------------+       | Centralized |
  Application (B) -->  | Local MongoDB | ----> |             | --> Processing
                       +---------------+       |   MongoDB   |
                       +---------------+       |             |
  Application (C) -->  | Local MongoDB | ----> |             |
                       +---------------+       +-------------+

So the way I see it, this is kind of reversed sharding.

Some side notes:

  • This is a one way street. Our application does only inserts. It does not need to read data from storage. On the processing side is only reading required. It does not insert any data.
  • It would be acceptable if some reports are missing during an outage
  • It would be acceptable if the reports are out of order in the centralized storage
  • The replication should be asynchronous, but near real-time. We need to process the reports in a matter of seconds
  • If the local MongoDB process is also using local storage, that would be fine but we don't require it.
  • Any open source or commercial solution would be acceptable, also third-party solutions.
  • It's critical that the local MongoDB process always accepts the report immediately, so the application can do something else right away
  • The application requires MongoDB, so we need to stick with this.

Any help or thoughts would be highly appreciated.

Best Answer

To do oneway replication from multiple "Local MongoDB" sources to one "Centralized MongoDB" realtime is possible with mongo-connector product. Just start one mongo-connector per every Local MongoDB.

I have used this product to replicate the one sharded collection (from nine shards) to one single mongodb.