How to deal with unbounded growth in cassandra shards

cassandranosqlpartitioningscalabilitysharding

Let's say we have a website like youtube where we have a comment session per e video. Users want to be able to comment and to see the comments made by other users by comment creation time. This is relatively easy to shard when you know there will be just a small amount of comments per video or you have a limit on the amount of comments per video. But how can we shard these comments when there growth is unbounded?

I have found very few information about this problem.
https://medium.com/@foundev/synthetic-sharding-in-cassandra-to-deal-with-large-partitions-2124b2fd788b

Best Answer

Usually, people add some 'bucket' field to the partition key - it could be simple bucket, with limited number of possible values (1-10, for example), or it could be time-bounded bucket, like, year+month, or even daily bucket. Main thing here is that it should be predictable, so you can send queries against these buckets.

With comments it's becoming trickier, as for really popular videos the number of comments could grow really fast, but this could be solved by lookup table (or as in article, shard table). In this case you need to do a read from this table as a first step, and then send many reads in parallel to fetch necessary partitions - this could be relatively fast, as you don't want to load all comments at once, and you're usually show only recent comments, and then load older on demand.