How to model data from arbitrary CSV’s in a database

database-designdatabase-recommendation

I am setting up a web interface for data management. Users upload CSV or similarly structured files, and I want to store these in the database so they can do manipulations on them — filtering, sorting, graphing, etc.

I do not know how to model this properly in a database. I have several ideas, but none seem to be the right way to do it.

  1. Make a new table for each uploaded CSV. This would mean each column could be appropriately typed (integers, strings, dates, etc.) and each record would trivially correspond to a line in the CSV file. This seems like a natural conceptualization of the problem — but would performance become an issue if I had to make a new table for each uploaded file?

  2. Make a table where each record represents a dataset (CSV) and have other tables where data points have the id their dataset in their record. This would mean that all the data from a given dataset is spread across different tables and there would be lots of redundancy (since each data point would store the id of the dataset). However, it would mean tables would not have to be created per-dataset.

  3. Other variations on 2. Most of my other thoughts were variations on number 2 with various amounts of indirection.

My question is essentially "How do I model this properly?", that is, with the ability to scale reasonably.

Most of the data will be scientific, so how do I deal with many data sets of sizes varying from trivial (say, 10 columns and 100 rows) to massive (hundreds of columns and thousands/millions of rows)?

tl;dr: How do I model arbitrary data from an arbitrary number of well-formed CSVs in a database, and would a new table for each CSV perform acceptably?

Best Answer

The question is so wide open that it is hard to say much.

And why are you using a database at all? Excel does everything you've described so far.

Seriously though, avoid making an inner system in your database that can store any kind of data. You have one already, called a DATABASE. Make code that creates the tables you need. What's so bad about that? I say option 1.

You know, if you want infinite flexibility how about a square-mile white board?