If I run nodetool repair --full
I see settings like this:
repairing keyspace some_keyspace with repair options
(parallelism: parallel, primary range: false, incremental: false,
job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 735)
But if I run nodetool repair --full --dc-parallel
I see this:
repairing keyspace some_other_keyspace with repair options
(parallelism: dc_parallel, primary range: false, incremental: false,
job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 1992)
Note the parallelism
setting. Since Cassandra 2.2, "parallel" has become the default. I was wondering what the difference is between "parallel" and "dc_parallel". The documentation implies that dc_parallel
is faster because it operates in all DCs at once, but in my brief testing it seems to be the opposite; perhaps it is due to my replication settings – most keyspaces are set to use NetworkTopologyStrategy
and place an equal number of replicas in each of my DCs – but I would like to be more sure than just a guess. "parallel" is not well documented.
For what it's worth, I am running Cassandra 3 (DSE 5.0.3.)
Best Answer
The problem here is the poor choice of name for the
--dc-parallel
option, IMO.Currently there are 3 "parallelism degrees" that is possible to specify for a repair operation: sequential, parallel and datacenter-aware (dc-parallel).