Keeping rows ordered by counter value

cassandranosql

I'm storing linked words in a Cassandra v2 cluster and I'm having a hard time doing efficient reads. I'm trying to fetch the top X rows with the highest counter value.

  1. Cassandra doesn't store the rows in any ordered way, so simply adding LIMIT X to the CQL query returns random-ish rows.
  2. The ORDER BY command in CQL only works for columns that are (part of) the key.
  3. OrderPreservingPartitioner doesn't exist anymore in Cassandra v2.

At the moment, when doing a read, I have to fetch all rows and manually select those X rows with the highest counter value. This literally slows some queries down by 100x, because even mildly popular words often have hundreds of thousands of words linked to them, not to mention the waste of bandwidth and other resources.

Are there any clever techniques to make this more efficient?

Best Answer

I'll try to address your concerns one at a time.

  1. Cassandra doesn't store the rows in any ordered way, so simply adding LIMIT X to the CQL query returns random-ish rows.

  2. The ORDER BY command in CQL only works for columns that are (part of) the key.

Not entirely correct, on either point. The on-disk sort order in Cassandra is determined by the first clustering column of your primary key. As a result, ORDER BY will also only work on that column (and only that column).

I'm trying to fetch the top X rows with the highest counter value.

Cassandra will not allow you to put a secondary index on a counter column, nor can you make it a part of your key (which also means you cannot sort by it). Therefore, querying data by the value of a counter column is not possible.

It would appear that your use case is not an appropriate fit for Cassandra. I would suggest solving this issue with a relational database or something else that would provide you with the necessary aggregation tools.