Right now the table only has about 320k records and I can use ALLOW FILTERING with no problem, but I realize this might not always be the case.
So here's the thing: Cassandra is very good at querying data by a specific key. It is also good at retrieving a range of data within a partition.
"SELECT * FROM {}.{} WHERE timestamp > {} ALLOW FILTERING;"
But due to its distributed nature, it is not good at scanning an entire table to compile a result set. And that's what you are asking it to do with the above query.
Network traffic is expensive. So the main goal with Cassandra, is to ensure that your query is served by a single node. When using ALLOW FILTERING
without specifying your partition key (name) causes your query to require a coordinator node, and check each node in your cluster for values that may match your WHERE clause.
Essentially, the more nodes that are in your cluster, the more detrimental ALLOW FILTERING
becomes to performance (unless you specify at least your partition key...only then you are guaranteeing that your query can be served by a single node). Note, that your slower query actually does this right, and solves that problem for you.
I had an idea to add another field that would be the day of the week, and maybe a month field as well.
And this is a good idea!
It solves two problems.
- It ensures that your query will be served by a single node.
- It protects your partitions from getting too big.
Cassandra has a limit of 2 billion cells per partition. As your partition key is "name" and you keep adding unique timestamps inside it, you will progress toward that limit until you either reach it, or your partition becomes too big to use (probably the latter).
Here is how I would solve this:
CREATE TABLE cryptocoindb.worldcoinindex_byday (
daybucket text,
name text,
datetime timestamp,
label text,
price_btc double,
price_cny double,
price_eur double,
price_gbp double,
price_rur double,
price_usd double,
volume_24h double,
PRIMARY KEY (daybucket, datetime, name)
) WITH CLUSTERING ORDER BY (datetime DESC, name ASC);
Now you could query like this:
SELECT * FROM cryptocoindb.worldcoinindex
WHERE daybucket='20170825' AND datetime > '2017-08-25 17:20';
Additionally, by clustering your rows on "datetime" descending, you are ensuring that the most-recent data is at the top of each cell (giving Cassandra less to have to parse through).
I moved "name" to be the last clustering column, just to maintain uniqueness. If you're never going to query by "name," then it doesn't make sense to use it as your partition key.
Hope this helps.
Note: I changed your timestamp int
to datetime timestamp
because it added clarity to the example. You can use whatever works for you, but just be careful of confusion arising from naming a column after a data type.
Edit 20170826
Is the following the same as your code or different? PRIMARY KEY ((daybucket, datetime), name)
No, that's not the same. That's using something called a composite partition key. It will give you better data distribution in your cluster, but it will make querying harder for you, and basically set you back to doing table scans.
For a good, comprehensive description of Cassandra primary keys, Carlo Bertuccini has great answer on StackOverflow:
https://stackoverflow.com/questions/24949676/difference-between-partition-key-composite-key-and-clustering-key-in-cassandra/24953331#24953331
Is there a way to alter the way Cassandra reads timestamps or an easy way to make changes to that whole datafield to alter the timestamp so it will be correctly read?
Not really. Cassandra timestamps can be tricky to work with. They store with millisecond precision, but don't actually show that full precision when queried. Also, as of one of the 2.1 patches, it automatically displays time in GMT; so that can be confusing to people as well. If your way of managing timestamps on the application side is working for you, just stick with that.
Best Answer
Appears this feature is deceptive and what you're trying to do is not yet possible. Slated for v3.
https://issues.apache.org/jira/browse/CASSANDRA-6377