Choosing a database for big data

database-design

Let's say I want to write a web service that stores home automation device readings such as thermostats, humidifiers, etc. The data would take the form of:

user_id | device_id | tag | value | timestamp

A user could have 5, 10, or 20 devices connected based on account type and readings could be taken at 1, 5, or 10 minute intervals based on account type. The service could record 10, 50, or 100 tags for each connected device.

This means I would theoretically need to support a batch insert of 2000 rows per user per minute.

Let's say I want to scale to 10,000 users. That means an upper limit of 20,000,000 rows inserted per user per minute. So 28 billion rows per day.

What kind of databases should I be considering to handle this kind of load and how would it scale? I'd especially be worried about storage space, I think it would need to make use of cloud storage.

Best Answer

I've heard of a company which is doing this. They're using a Cassandra cluster with several nodes (in two data centers).