Cassandra nodetool repair open files issue

cassandra

I am using Cassandra 3.6. After nodetool repair Cassandra start takes too much time. The message is:

ViewManager.java:226 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized

The system stuck hours on this message. Any suggestions, please. The number of the open files raised significantly. From 100k to 1.5 millions.

Best Answer

Since SSTables can contain tokens from multiple token ranges, and repair is performed by token range, it was necessary to be able to separate repaired data from unrepaired data. That process is called anticompaction. Level Compaction Strategy (LCS) a very intensive strategy where SSTables get compacted way more often than with STCS and TWCS. LCS creates fixed-sized SSTables, which can easily lead having thousands of SSTables for a single table. The way streaming occurs in Apache Cassandra during repair makes that overstreaming of LCS tables could create tens of thousands of small SSTables in L0 which can ultimately bring nodes down and affect the whole cluster. This is particularly true when the nodes use a large number of the vnodes. I have seen happening on several customer clusters, and it requires then a lot of operational expertise to bring back the cluster to a sane state. A safety measure has been set in place to prevent SSTables from going through anticompaction to be compacted, for valid reasons. The problem is that it will also prevent that SSTable from going through validation compaction which will lead repair sessions to fail if an SSTable is being anticompacted. Given that anticompaction also occurs with full repairs, this creates the following limitation: you cannot run a repair on more than one node at a time without risking to have failed sessions due to concurrency on SSTables. The only way to perform repair without anticompaction in “modern” versions of Apache Cassandra is subrange repair, which fully skips anticompaction. To perform a subrange repair correctly, you have three options :

Compute valid token subranges yourself and script repairs accordingly
Use the Cassandra range_repair.py script which performs subrange repair
Use Cassandra Reaper, which also performs subrange repair. Google it to find as repo's might change

To decrease the number of the open files and minimize the restart time use: nodetool cleanup; nodetool compact

Related Solutions

Cassandra upgrade/repair issues in migration

It turns out that the original cluster had different DC and rack names in it's topology, I changed the names and copied the data again and it works.

How to Get nodetool Without Cassandra

The easiest (non-invasive) way is probably to download the tarball installation (you'll need to select either a Mac or Linux-based OS for it to allow you to download the tarball). Based-on your mention of disabling the service, I'm going to guess that you want to accomplish this on Windows. If that's not the case, please indicate so in the comments.

Un-tar dsc-cassandra-2.0.8-bin.tar.gz to the location you want to run Nodetool out of. ex:

$ cd /tools
$ tar -zxvf dsc-cassandra-2.0.8-bin.tar.gz

Note: You may have a different application you use for tarballs. I ran this from a Cygwin terminal.

Find the location of your JRE/JDK (not the bin directory) and set that as your "JAVA_HOME" (System) environment variable. When you have it set properly, you should be able to query it via CMD:

>echo %JAVA_HOME%
C:\Program Files (x86)\Java\jre7

Once you have JAVA_HOME set, it should work from either CMD or Powershell:

C:\tools\dsc-cassandra-2.0.8\bin>nodetool -h 192.168.1.85 status
Starting NodeTool
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                               Rack
UN  192.168.1.85  506.29 MB  256     100.0%  cd39f0fe-ed67-40cf-b6bd-504cedabf497  rack1

This way, you can run nodetool without messing with an installer or services.

Best Answer

Related Solutions

Cassandra upgrade/repair issues in migration

How to Get nodetool Without Cassandra

Related Question