Mysql – Storing a large number of small size entries in MySQL

MySQLmysql-5schema

I have a database that stores readings from numerous gauges. Depending on the gauge, there may be data for that gauge every 1-5 minutes with multiple readings (i.e.- weather station and it stores 5 different readings).

I currently store the data by date. I group everything for a particular gauge for a single date together and store is as xml in a text column.

Would it be more efficient to store this data as individual rows?

I'm trying to decide which is the smartest solution long term (i.e.- fewer rows with more data per row, or many more rows, each being smaller).

It is a read heavy environment.

EDIT:

By 'efficient' I am most concerned about speed and resource usage (if I pull back a year's worth of data, which would be least processor/memory intensive).

I almost exclusively use this data to graph for clients. The number of readings per timestamp is variable, from 1 to 6 readings.

The database is currently an INNODB.

EDIT 2:

I'm looking more towards keeping the data in MySQL for ease of access/updating.

The answer I was looking for more was whether it would be smarter to save each individual timestamp (which may have multiple readings) in an individual row or whether to group multiple timestamps (i.e.- up to a day's worth) into a single row.

Thanks

Best Answer

It depends on exactly what you're trying to do with the data -- if it's only being used for graphing, and you don't need high resolution the further out it goes (eg, you're not trying to plot a graph with full temporal resolution for a day from 1 year ago), you might actually want to look at RRDTool rather than a relational database.

If you're going to ever need to do analysis of the values (how often does a place get above 80°F?), you'll want to store discrete values, not an XML structure; but you could also use flat file stuctures that are meant for dealing with this type of data (eg, CDF, NetCDF ... maybe even HDF )

update :

I'd store each time as a separate record, as it makes it easier to adjust the granularity when graphing. For instance, to extract the high/low/mean for each hour:

SELECT   min(date_obs),min(temp),max(temp),avg(temp)
FROM     observations
WHERE    date_obs between ...
GROUP BY floor(unix_timestamp(date_obs)/3600)

Also, it allows you to change the cadence for the measurements without needing to change the table structure.