Cassandra – Read from Partition in Local Node Only

cassandra

I'm aware that I can read from any cassandra node and it acts as coordinator to read from the node containing a specific partition, but can I read data only from the partition which is on the node I'm connecting to?

In other words, when a cassandra cluster has 10 nodes, it contains 10 partitions on each node (and maybe replicas on other nodes when RF is set). When I send a SELECT * FROM TABLE I would like to get only 1/10th of the total data, which is really stored on that specific node without any traffic to other nodes.

Thank you so much!

Best Answer

You can do it as following (classes names, etc. are for driver 3.x, could be slightly different in 4.x):

  • Identify what token ranges are handled by specific node - you can obtain this information from Metadata class as cluster.getMetadata().getTokenRanges(keyspace, host) (see doc);
  • for every token range repeat:
    • generate the query like select * from table where token(part_keys) > rangeStart AND token(part_keys) <= rangeEnd (but this may not work all the time, as you need to handle cases when node's token range is split between end of token ring and begin of token ring - see linked code)
    • create an instance of Statement for each query (for example, as SimpleStatement), and set consistency level to LOCAL_ONE and host to which query should be sent (via setHost function)
    • perform query using the execute or executeAsync functions and process data

Source code that performs a full cluster scan could be found here - you can reuse pieces of it regarding generation of queries on token ranges, etc.