Cassandra – reinserting a node

cassandraclustering

I had to remove a cassandra dead node on a hurry (because an important script with a truncate query wouldn't work due to consistency level reasons — any thoughts on that would be nice too).

Anyway, the data is still there and I don't want to risk losing any of it. If my understanding is correct, I'd need to delete the data (move it away) and then reattach the node to the cluster as if it was a new one.

Could I just run the node with all its data there or could it cause any sort of corruption?

Best Answer

the first thing I'd worry about is zombie data. the data on your dead node might overwrite the data that you truncated. to be sure, I'd move that data somewhere else. also, delete the contents of the /commitlog and /saved_caches directories (or move them if you feel the need).

I'm assuming you have a suitable replication factor on your data? So there will be a copy of the data elsewhere in the cluster. When you re-add the node, the data will stream from the other nodes.