I have cassandra table defined as:
CREATE TABLE db.table (
value text,
time timestamp,
sid text,
PRIMARY KEY (sid, time)
) WITH CLUSTERING ORDER BY (time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Table is pretty big, here are data files:
root@server:/home# ls -l /var/lib/cassandra/data/db/table/*-Data.db -h
-rw-r--r-- 1 cassandra cassandra 8.7G Aug 16 00:14 /var/lib/cassandra/data/db/table/db-table-ka-200785-Data.db
-rw-r--r-- 1 cassandra cassandra 2.2G Nov 5 21:28 /var/lib/cassandra/data/db/table/db-table-ka-208079-Data.db
-rw-r--r-- 1 cassandra cassandra 20G Nov 20 20:55 /var/lib/cassandra/data/db/table/db-table-ka-208702-Data.db
-rw-r--r-- 1 cassandra cassandra 18G Dec 17 17:31 /var/lib/cassandra/data/db/table/db-table-ka-210089-Data.db
-rw-r--r-- 1 cassandra cassandra 455M Dec 18 14:15 /var/lib/cassandra/data/db/table/db-table-ka-210153-Data.db
-rw-r--r-- 1 cassandra cassandra 4.4G Dec 27 09:27 /var/lib/cassandra/data/db/table/db-table-ka-210477-Data.db
-rw-r--r-- 1 cassandra cassandra 3.9G Jan 7 14:50 /var/lib/cassandra/data/db/table/db-table-ka-210834-Data.db
-rw-r--r-- 1 cassandra cassandra 4.1G Jan 16 04:47 /var/lib/cassandra/data/db/table/db-table-ka-211203-Data.db
-rw-r--r-- 1 cassandra cassandra 997M Jan 19 08:56 /var/lib/cassandra/data/db/table/db-table-ka-211292-Data.db
-rw-r--r-- 1 cassandra cassandra 1.1G Jan 20 23:24 /var/lib/cassandra/data/db/table/db-table-ka-211389-Data.db
-rw-r--r-- 1 cassandra cassandra 1.1G Jan 22 00:03 /var/lib/cassandra/data/db/table/db-table-ka-211479-Data.db
-rw-r--r-- 1 cassandra cassandra 285M Jan 22 11:33 /var/lib/cassandra/data/db/table/db-table-ka-211495-Data.db
-rw-r--r-- 1 cassandra cassandra 51M Jan 22 12:16 /var/lib/cassandra/data/db/table/db-table-ka-211500-Data.db
-rw-r--r-- 1 cassandra cassandra 16M Jan 22 12:31 /var/lib/cassandra/data/db/table/db-table-ka-211501-Data.db
-rw-r--r-- 1 cassandra cassandra 16M Jan 22 12:46 /var/lib/cassandra/data/db/table/db-table-ka-211502-Data.db
Table is user to store some metrics as time series. Every row inserted is set with TTL of one month.
-
First thing I noticed here is existance of sstable much older than TTL, possibly because of not optimal compaction strategy for time series data. Is it safe to alter the table and change compaction strategy to DTCS now? What compaction subproperties should be best regarding TTL of one month?
-
Since all data is inserted with the same TTL, I guess it's better to set default TTL on table. According to Datastax:
Default TTL on a table is another natural fit
for time-series data. It allows you to avoid costly compactions that
run solely for the purpose of dropping tombstone and/or TTLed cells.
Actually, it allows the compaction logic to be much more efficient,
simply dropping the sstable when the newest cell timestamps are old
enough to indicate all of the sstable is past its shelf life. This
data is kept in the table metadata when it is written, making it easy
to check. When default TTL is used in conjunction with a time-series
workload, you can further reduce the compaction load associated with
data density. When used in conjunction with DTCS, you can almost
eliminate the geometric loads associated with high density on the
compaction side of the system.Is it safe to do it now?
-
Direction of clustering order is not optimal for getting last value.
select * from table where sid='XXX' limit 1
gives the first entry, not last.
Is it possible to change clustering order to DESC, at this moment?
Best Answer
You can't, because that would require rewriting all your data on disk in a different order, while resorting at runtime until the rewrite was done, which would result in an unacceptable hit to performance. You'll need to create a new table and bulk-load it in.