Apache Cassandra – How to Handle Dynamic Column Set Conflicts

cassandra

I know that each column in Apache Cassandra has a timestamp attached and that read conflicts for a single column is resolved deterministically by looking at timestamp or by comparing the value.

Let's say I add a column to a dynamic column set. I write this column to a single node. I then later add another column to the same column store, but this time to another node. How does Apache Cassandra merge these two? Will both columns exist after the merge?

Best Answer

The best way to think of a cassandra database is not as a set of databases on different nodes, but as one single database. Adding the column to the first node adds it to all other nodes. The actual number of times your data is written is determined by your replication strategy and replication factor, but each copy of the data will be the same.

Thus, if you tell the first node about a new column, the second node will automatically understand it and be able to access that data. If you tell the second node about different data in the same column, it will either overwrite the old data or add new data, depending on whether the row you're writing into already has data in it.

If you're adding a new column to the second node, then all the data in that column and the data in the first column will simultaneously exist and can be queried through either/any node.