What database supports over a thousand fields and tables

nosql

I have a requirement to store several csv files with several fields in an database. This data does not need to be searchable but i need to query the data by Id. I will also need SQL to query the data so this may rule out many nosql options. Considering, each csv file is huge I will need to put each csv file in a separate table. i do not need to perform joins, subqueries or anything complicated – I just need to query by Id. Since the database will be getting hit hard with thousands of requests per hour horizontal scaling would be nice. Is there a database that can handle these requirements?

I have checked out MongoDB but it does not support sql. MongoDB would have been perfect if it supported basic sql.

Best Answer

My recommendation would be for you to check out Cassandra. In my experience, and based on the requirements you seem to have, it seems like it would be a good fit.

It appears to me that you have essentially have these requirements:

(1) Key-value storage (look up data based on ID)

(2) No need for JOIN or sub-SELECT

(3) Need for a SQL-like language

(4) Horizontal scalability

As for (1), Cassandra will hash the primary key (in your case you can use a single column id as your primary key) and which node in the cluster stores the data, and the Cassandra internals ensure look up by primary key is very fast. Cassandra doesn't support JOIN as in (2), but you appear to be OK with that.

Since you are essentially doing key-value lookups, I'm actually not sure why you need an SQL language. Perhaps you are wanting to not only look up data by primary key, but also perform aggregations and use GROUP BY? In this case, Cassandra won't work, but if you only need a simple SQL language to retrieve data as per (3), Cassandra has CQL, which may serve your needs.

Horizontal scalability is built right in to Cassandra, covering (4).

All that being said, Cassandra isn't a silver bullet. It's a great technology for certain applications, and it sounds like it fits you needs. Still, if you want aggregations, or if you want to query for multiple primary keys in a single query (e.g. SELECT * FROM table WHERE id IN (...)), then you may run in to some issues.

Now, if you can let go of constraint (4), I'd 100% be promoting PostgreSQL. You stated the database will be getting hit with thousands of requests per hour, but that's nothing for a database like PostgreSQL, if your queries are only individual primary key lookups, and you aren't trying to transfer large amounts of data in those queries.

Best of luck! If you provide more info based on my feedback, perhaps the community here can help you narrow down your selection even more.