Cassandra query performs differently at different times

cassandra

I am using apache-cassandra-2.0.12 in production, with network topology strategy and ReplicationFactor : 3 in a cluster with 2 DC’s each contains 4 nodes

While analyzing the response time for the read requests, we found out that some queries are performing slower than it actually does.
Eg : Consider the following table

Create ColumnFamily "Employee"
(
  empID bigint,
  uniqueID text,
  col1 text,
  col2 text,
  col3 text,
  primary key (empID,uniqueID)
)

This CF contains data for more that 5K row entires and and each employee contains minimum of 100K columns and the maximum of 1000K columns

So in this CF the response time for the following queries differs drastically from time to time

SELECT * form "Employee" where empID = xxx and uniqueID = 'value';

Some times the response time for the above query is more than 3 sec, whereas it should actually take within 50 milliseconds

I have monitored the load (compaction time, disk utilizations etc ), CPU and the memory of the nodes at that time . All these params were normal.

Is there anything that I have missed or is this the normal behavior of cassandra ?

Note: I don't have any tombstone columns in this CF

Best Answer

primary key (empID,uniqueID)

This CF contains data for more that 5K row entires and and each employee contains minimum of 100K columns and the maximum of 1000K columns

That's WAY too many rows per partition. My guess is that the query slowdowns happen when large partitions are queried. It all depends on data cell value size and data width, but as a general rule, I would not model more than 10k-30k rows per partition.

To test this out, you could run nodetool tablehistorgrams on your table to gauge things like max cell count and partition size. Then run your query against both small and large partitions with TRACING ON, and I'm sure you'll see the difference.

Basically, try reworking your model for smaller partitions.