What things we should consider when configuring Cassandra Cluster

cassandra

I'm new to Cassandra, so wanted to understand what basic things we should consider when are we going to configure Cassandra cluster in on or multiple database centers?

Thanks!

Best Answer

what basic things we should consider when are we going to configure Cassandra cluster in on or multiple database centers?

This is a bit of an open-ended question, but I build multi-DC Cassandra deployments on a fairly regular basis, so I'll try to provide some insight here.

The first thing to consider, is what you hope to achieve by building a presence in both data centers. The main reasons are data locality and disaster recovery.

Data locality is in reference to the application Cassandra is serving. So if an application team is deploying to two DCs, they're going to want their data nearby. Disaster recovery is good to have if you lose an entire DC. This allows you to easily and quickly rebuild another logical or physical DC.

Another point to consider is whether or not the data centers are hosted by the same provider. Assuming a cloud deployment, you'll want to understand the network between the cloud regions, what your bandwidth is, and how much it'll cost to use. Obviously, things will be easier if your nodes can easily communicate to each other cross-DC when they need to.

If the data center providers are different, you're going to want be very sure about how consistent that network connection is. In this scenario, you're going to want to set the phi_convict (default 8) to a higher value. That will give your nodes a little more leniency in marking nodes as up or down, when the network becomes unstable.

You will also want to set the replication strategy on the system_auth table to use the NetworkTopologyStrategy, and make sure it has an appropriate number of replicas in each DC. Otherwise, your authorization requests will be jumping DCs.