How to a Cassandra node see another node as down

cassandra

I'm running Cassandra on three nodes. Here's their nodetool status output:

ubuntu@ip-10-0-8-8:~$ nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID                               Rack
UN  10.0.9.8   2.07 MB    256     ?       c8d574b9-540c-410f-9326-789eb75d3d14  1c
UN  10.0.8.8   2.06 MB    256     ?       d9454056-a358-4428-ab5f-c03e8042167e  1d
UN  10.0.10.8  2.01 MB    256     ?       3617643d-b0a8-4b72-a9d4-feded4445292  1a

ubuntu@ip-10-0-9-8:~$ nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID                               Rack
UN  10.0.9.8   2.07 MB    256     ?       c8d574b9-540c-410f-9326-789eb75d3d14  1c
UN  10.0.8.8   2.06 MB    256     ?       d9454056-a358-4428-ab5f-c03e8042167e  1d
DN  10.0.10.8  2.09 MB    256     ?       3617643d-b0a8-4b72-a9d4-feded4445292  1a

ubuntu@ip-10-0-10-8:~$ nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID                               Rack
UN  10.0.9.8   2.07 MB    256     ?       c8d574b9-540c-410f-9326-789eb75d3d14  1c
UN  10.0.8.8   2.06 MB    256     ?       d9454056-a358-4428-ab5f-c03e8042167e  1d
UN  10.0.10.8  2.01 MB    256     ?       3617643d-b0a8-4b72-a9d4-feded4445292  1a

Everything looks fine except one thing (last line in second block):

DN  10.0.10.8  2.09 MB    256     ?       3617643d-b0a8-4b72-a9d4-feded4445292  1a

The D in the start of the line indicates the node being down. How can it be that 10.0.9.8 is seeing the node as down while the other nodes are seeing it just fine? Does this lead to inconsistencies?

Using Cassandra version 2.1.1 by the way.

Best Answer

Running nodetool enablegossip on the host that appeared down to other nodes fixed it for me and for now. However, it appeared as down to all other nodes I checked. Running in a non-cloud environment.

I was curious what my other nodes said and I found one that had the same issue (like your 10.0.9.8, showing 10.0.10.8 as down). Only running nodetool enablegossip on 10.8 didn't help. But running disablegossip first and then enablegossip again did!