MongoDB CP vs Cassandra AP – Understanding Differences

cassandraconsistencyhigh-availabilitymongodbpartitioning

I have read lot of articles on net but still confused why Mongo CP, Cassandra AP, RDBMS CA ?
Will explain my understanding and query along with that .

Mongo

Consider a scenario where I have one master ans two slaves. consider

  1. Write request arrives and goes to master.
  2. It gets committed on master only but master goes off(crashed) before it is written to slaves
  3. Till master is re-elected, write requests need to wait and system is not available
  4. Once the previous node(node crashed in step 2) comes back, writes pending from that node are written back to slaves. This is called
    eventual consistency.

Per my understanding because of step 3 and 4, Mongo is said to be CP where C stands for eventual consistent. Correct ?

Cassandra

Here there is no master/slave model and every node takes its share write and read request based on shard key.

  1. Write request arrives to any node(called coordination node).
  2. Coordination node redirects to one of the node based on shard key
  3. It gets committed but node goes off(crashed) before it is written to other replication node.
  4. Again write request with same shard key, now coordinated node redirect it immediately to replica node(replica of crashed node)
  5. Once the previous node(node crashed in step 3) comes back, writes pending from that node are written back to replica node. So cassandra seems
    to be eventual consistent too ?

Step 4 explains why cassandra is highly available but step 5 also depicts its eventual consistent. So Per my undsertanding , cassandra provides
provides both eventual consistency ans availability. Then why it is said it does not provide Consitency ?

Best Answer

C stands for eventual consistent. Correct ?

Consistency in the CAP theorem is referring to strong consistency where every read receives the most recent write or an error. By default MongoDB drivers direct all reads & writes to the primary of a replica set, which is strongly consistent.

The CAP theorem asserts that a distributed system must choose between consistency and availability in the event of a network partition. MongoDB's replica set approach uses a single primary for write consistency (CP), while Cassandra's replication strategy favours write availability (AP). Strong consistency is not possible with a network partition because there could be a conflict if both sides of the partition update the same data. To maintain write availability AP database systems need a solution for conflict resolution, which is a separate consideration from eventual consistency.

However, CAP is a simplification of real-world behaviour: MongoDB and Cassandra both have tunable levels of consistency for reads and writes. For example: MongoDB has write concerns to determine the level of acknowledgement required for write operations, read preferences for routing requests to members of a replica set, and read concerns to control the recency, consistency, and isolation properties of data read from replica set and sharded deployments.

Eric Brewer, author of the CAP Theorem, revisited this in 2012 with a more nuanced take: CAP Twelve Years Later: How the "Rules" Have Changed.

  1. Till master is re-elected, write requests need to wait and system is not available

There are no writes without a primary, but replica sets still have read availability. MongoDB 3.6 added a Retryable Writes feature which helps applications better handle replica set elections and transient network errors.

  1. Once the previous node(node crashed in step 2) comes back, writes pending from that node are written back to slaves.

If the primary in a MongoDB replica set becomes unavailable, the remaining members of the replica set will elect a new primary if there is an eligible secondary and a quorum of voting members. In your example, the voting majority would be 2/3 members of your replica set. Any writes accepted by a former primary that were not written to a majority of replica set members will be rolled back (saved to disk) so the former primary resumes syncing from a state consistent with the history of the current primary.