You are correct, with MongoDB the way to engineer around a write contention issue is to shard.
Your environment sounds like it's fairly bursty, in that you're not continually ingesting data and instead ingesting it in fairly discrete chunks. With this in mind, you could go with a collector/distributor model such as this:
------>{ workers }<------
| |
[Shard01] [Shard02]
| |
---->[Persistant]<------
The workers would upsert/push their results into the sharded collection/database, and once the job was completed a batch-process then uses something like db.copyDatabases() to push the result set at the monolithic (and cheaper to run) single instance Mongo. As the copy database process can push all of its updates in one run, it should experience much fewer write-lock problems.
You can get rid of the pager, but not if you expect >= 50% failure.
PXC (Galera), spread across 3 geographic locations, is an excellent HA configuration.
Optionally, one of them can be a "garbd" -- light weight, no data -- node only for providing a "quorum".
If any one server goes down, the other two declare that they have a quorum and continue running.
If the network goes down, and 2 servers are split from the 3rd, then the 2 declare quorum and the 1 gracefully fades away.
"Normally" you need to plan for a single point of failure. That is what the above provides.
If you really need to survive (without a midnight page) two failures, then you need 5 (or more) nodes in the cluster -- and place them in 5 geographic locations. (Else a tornado, earthquake, etc, could take out multiple nodes.)
I am unclear on what you mean by a "Slave tracking all nodes". Perhaps you want to set up two 3-node clusters with async replication from one to the other? That is possible, and it has some degree of failover.
Some people even put the 3 nodes in a single location at one end; the the other 3 at a single location at the other end. This gives them most of the HA, but without the dreaded latency you may experience between the 3+ nodes in a WAN cluster.
Depending on where your clients are, and which node each client prefers to talk to, a WAN Cluster may be both faster and more reliable than any non-Cluster configuration. (That is another discussion.)
As for "all but one node can go down" -- that is [probably] a theoretical impossibility, especially when you consider that "split brain" is one of the failure cases. The Cluster depends on discovering which subset of more than 50% of the nodes are still talking to each other.
Best Answer
MongoDB does not (as at 4.0) have a supported feature or tool for selective replication or syncing between two distinct deployments.
I would recommend upgrading from 3.4 to a newer version of MongoDB (ideally 4.0) and building a sync solution using the Change Streams API. You can choose which events to replicate to a historical archive or deployment based on the event type, namespace, or other criteria. Change streams are available for replica sets and sharded clusters using the WiredTiger storage engine and replication protocol version 1. MongoDB 3.6 is the first version to add the Change Streams API and includes support for watching individual collections. In MongoDB 4.0 the change stream support was extended to enable watching all non-system collection changes at a database or deployment scope.
If you are unable to upgrade from MongoDB 3.4 in the near future, you could also consider the approach of directly tailing the replication oplog. This would be a less robust solution than the Change Streams API, but there are some third party tools such as
mongo-connector
that may be helpful. The Change Streams API also uses the oplog but adds a supported API, only includes majority-committed operations (that won't rollback), and scales to support sharded clusters. The oplog format is internal and subject to change between releases of MongoDB.