Data replication in Cassandra

cassandranosql

We have a 5 node cluster in Cassandra in which we have created a keyspace with RF=3.

After creating a column-family(table) when we insert the data into the same, it is getting replicated in all the 5 nodes even though RF=3.

Shouldn't the data be replicated to 3 nodes only and thus be accessible in 3 nodes and not all 5 nodes?

Also, we have used vnodes.

Best Answer

Perhaps I'm misunderstanding your question, but when you write data to the cluster, the primary key is hashed using the Murmur3 partitioner (usually) and then the data is written to the node that is responsible for that hash range, which could be any one of your 5 nodes.

In addition, your data is also being written at the same time to 2 other nodes for a total of 3 copies of the data spread around the cluster. When you do nodetool status some-column-family you should see that each node is responsible for (3 copies/5 nodes) 60% of the data.

It doesn't matter which node you query for a read, because all of the nodes are equal and any node can act as a coordinator to gather the results to pass back to the client.

Hopefully that answers your question or points you in the right direction.

Related Solutions

What may be the consequences of deleting the system keyspace

The system keyspace is where a Cassandra node maintains information about itself and other nodes. So things like network addresses, responsible token ranges, status from gossip, hints, schema, and other local information. The system keyspace has its own, special replication strategy known as "LocalStrategy:"

cqlsh:system> desc KEYSPACE;

CREATE KEYSPACE system WITH replication = {
  'class': 'LocalStrategy'
};

This should tell you that data in the system keyspace isn't replicated to other nodes, and is largely specific to itself.

Sometimes you can fix gossip-related issues by blowing-away the system keyspace (or even just all entries in system.peers). Removing it via the filesystem (as you did) has the same effect. The reason this works, is because it forces the node to rewrite all of that data, most of which it already has (system.local) or it can learn from gossip.

While I wouldn't make a habit out of it, deleting the system keyspace via the filesystem is ok...as long as you have another node in the cluster for it to communicate with.

Help in understanding Hinted Handoffs and Data replication in Cassandra

Your understanding is correct that beyond the default 3 hours, hints are no longer stored. This is to avoid overloading the other alive nodes for prolonged period of time.

If any node happens to be down for a longer period than the hint_window, its absolutely necessary to rebuild the node or even remove all the files from its data directory and let it bootstrap as a brand new node. You may have to use replace_address if you are bringing up the same node.

Best Answer

Related Solutions

What may be the consequences of deleting the system keyspace

Help in understanding Hinted Handoffs and Data replication in Cassandra

Related Question