A good way to store a large number of columns

database-design

I have a problem deciding how to store this data in my database. Any suggestions on the best way to do it? I don't know a hell of a lot about databases, I might add.

I have data coming in formatted like so, but rather than 4, the number of columns is approx 240, so each date has 240 unique values associated with it:

Date/Time 200,00 202,50 205,00  
2010.11.12  13:34:00  45,8214 43,8512  41,5369   
2010.11.12  13:35:00  461,9364  454,2612  435,5222 

Also, rows are associated with DataSites.

My first thought was to have a table like so:
DataID (pk), DataSiteID, ParameterID, Date, Value, with an index on DataSite, Parameter and Date. The ParameterID refers to another table that stores the input column headers (200,00 202,50 205,00 …).

My second thought was simply to have a table with all 240-odd columns. I have come up with a few other ways, but they are pretty unsatisfactory as well.

The problem I have with my first solution (not such a huge problem, but I don't like it), is that the Date and DataSiteID are going to be repeated for all 240 values in that input row, so it uses quite a bit of extra space.

There will be about 40gb of data a year coming in (in the above text format), and the data will be searched by DataSite, Parameter and Date. The amount of data coming in will most likely quadruple in a year or so.

Any good ideas?
Thanks, James

edit: This is time series data, with the columns being measurements at different wavelengths. Data will want to be analysed within a relatively narrow range of wavelengths. There could also be extra wavelengths added in at some point in the future.

edit: Thanks for the answers guys, I really appreciate it 🙂 I think I can probably find time to run some experiments with 500gb or so of test data. I'll post back with any conclusions 😉

Best Answer

You could make a case either way, but if the data is going to be used for analysis and you often want to see multiple columns from that data at the same time, go with the wide table. Make sure you know your databases column quantity and row size limits. Make sure you get the datatypes right. If many of the columns are null, SQL Server allows you to optimize the table for that. You could also consider using a NOSQL (Not Only SQL) solution for analysis of this type of data.

If this data is going to be less for analysis, you might want to normalize it as stated in your question.