Cassandra Backup – From All Nodes or Just One?

cassandra

Cassandra is a distributed database, where each node is in sync with the other nodes from the same ring/cluster.

When taking backups based on a snapshot, do I need to back up each node individually or is one enough?

The docs say:

To take a global snapshot, run the nodetool snapshot command using a
parallel ssh utility, such as pssh.

Am I missing a point here?

Best Answer

the short answer -> you have to take snapshots on all nodes.

As you pointed out, Cassandra is a distributed database. As an example, suppose you have 3 nodes with a replication factor (RF) of 2. Each node has primary responsibility for 1/3 of all the tokens in the ring. In addition, each node has a replica from the another node, and nodetool status will show "Owns 66.6%" (2 replicas / 3 nodes).

If you only backup one node, you only get the data on that node plus whatever replicas are being stored on that node. Since the data is distributed, you will end up missing some data unless you take snapshot on all nodes.