Data replication in Cassandra

cassandranosql

We have a 5 node cluster in Cassandra in which we have created a keyspace with RF=3.

After creating a column-family(table) when we insert the data into the same, it is getting replicated in all the 5 nodes even though RF=3.

Shouldn't the data be replicated to 3 nodes only and thus be accessible in 3 nodes and not all 5 nodes?

Also, we have used vnodes.

Best Answer

Perhaps I'm misunderstanding your question, but when you write data to the cluster, the primary key is hashed using the Murmur3 partitioner (usually) and then the data is written to the node that is responsible for that hash range, which could be any one of your 5 nodes.

In addition, your data is also being written at the same time to 2 other nodes for a total of 3 copies of the data spread around the cluster. When you do nodetool status some-column-family you should see that each node is responsible for (3 copies/5 nodes) 60% of the data.

It doesn't matter which node you query for a read, because all of the nodes are equal and any node can act as a coordinator to gather the results to pass back to the client.

Hopefully that answers your question or points you in the right direction.

Related Question