Setup Cassandra Database

cassandra

I have a very simple cassandra database with just one table containing the following records:

  • device ID – long integer representing the device the 'value'
    came from – primary key
  • tag ID – long integer representing
    the identity of the 'value'
  • timestamp – long integer
    representing the date/time the 'value' was collected
  • value – decimal number or alphanumeric value, which is the data value itself

Each device transmits about 20 values every 30 seconds. we have about 500 devices which will increase to 3000 over the next 18 months or so.
once the data is stored it is never modified or even deleted (at this time).

My cassandra node sits on an AWS EC2 server with the system files on the root disk, cassandra data files on their own disk, and commit files on their own disk. all disks are currently general purpose SSD.

This data is accessed for presentation and analysis (read only) and 90% of all activity is on data that is less than 90 days old.

Question 1:
I want to be able to automatically transfer data records that are more than say 360 days old from SSD onto cheaper magnetic disk (still on EC2).
how should we set up the cassandra database server to achieve this?

Question 2:
I have an existing mysql database with over 5 years worth of very similar data which I want to migrate into the cassandra datbase.
Writing the migration tool is fairly straight forward, however, my concern is that when I execute the migration tool all the data will end up on the SSD because cassandra thinks it is all new data, rather than data that has aged in real time. Is it possible to tell cassandra, from the PHP migration tool, to put data older than 360 days onto the magnetic disk defined in question 1?

Best Answer

Q2: If I understand your question correctly, I don't think you want to do that nor should you use Cassandra if that's what you're trying to do. Cassandra is meant to spread data across nodes so that if one node fails then you don't lose your data or crash the entire cluster. If you are putting only specific data one one single hard drive, than you're compromising how Cassandra works.

What's your use case for Cassandra? Why don't you transfer the SSD drive data to a commodity disk then just unplug the SSD and use it for something else? Why do you need Cassandra specifically? Do you have an insane number of writes? What size data are you working with? If you're read heavy you might be better off with a different database.