Mysql – Storing a large number of small size entries in MySQL

MySQLmysql-5schema

I have a database that stores readings from numerous gauges. Depending on the gauge, there may be data for that gauge every 1-5 minutes with multiple readings (i.e.- weather station and it stores 5 different readings).

I currently store the data by date. I group everything for a particular gauge for a single date together and store is as xml in a text column.

Would it be more efficient to store this data as individual rows?

I'm trying to decide which is the smartest solution long term (i.e.- fewer rows with more data per row, or many more rows, each being smaller).

It is a read heavy environment.

EDIT:

By 'efficient' I am most concerned about speed and resource usage (if I pull back a year's worth of data, which would be least processor/memory intensive).

I almost exclusively use this data to graph for clients. The number of readings per timestamp is variable, from 1 to 6 readings.

The database is currently an INNODB.

EDIT 2:

I'm looking more towards keeping the data in MySQL for ease of access/updating.

The answer I was looking for more was whether it would be smarter to save each individual timestamp (which may have multiple readings) in an individual row or whether to group multiple timestamps (i.e.- up to a day's worth) into a single row.

Thanks

Best Answer

It depends on exactly what you're trying to do with the data -- if it's only being used for graphing, and you don't need high resolution the further out it goes (eg, you're not trying to plot a graph with full temporal resolution for a day from 1 year ago), you might actually want to look at RRDTool rather than a relational database.

If you're going to ever need to do analysis of the values (how often does a place get above 80°F?), you'll want to store discrete values, not an XML structure; but you could also use flat file stuctures that are meant for dealing with this type of data (eg, CDF, NetCDF ... maybe even HDF )

update :

I'd store each time as a separate record, as it makes it easier to adjust the granularity when graphing. For instance, to extract the high/low/mean for each hour:

SELECT   min(date_obs),min(temp),max(temp),avg(temp)
FROM     observations
WHERE    date_obs between ...
GROUP BY floor(unix_timestamp(date_obs)/3600)

Also, it allows you to change the cadence for the measurements without needing to change the table structure.

Related Solutions

Mysql – Find total duration of each consecutive series of rows

Query 1, tested in SQLFiddle-1

SET @ward_id_to_check = 1 ;

SELECT
    st.patient_id,
    st.bed_id AS starting_bed_id,          -- the first bed a patient uses
                                           -- can be omitted
    st.admitted,
    MIN(en.discharged) AS discharged
FROM
  ( SELECT patient_id, bed_id, admitted, discharged
    FROM t 
    WHERE t.ward_id = @ward_id_to_check
      AND NOT EXISTS
          ( SELECT * 
            FROM t AS prev 
            WHERE prev.ward_id = @ward_id_to_check
              AND prev.patient_id = t.patient_id
              AND prev.discharged = t.admitted
          )
  ) AS st
JOIN
  ( SELECT patient_id, admitted, discharged
    FROM t 
    WHERE t.ward_id = @ward_id_to_check
      AND NOT EXISTS
          ( SELECT * 
            FROM t AS next 
            WHERE next.ward_id = @ward_id_to_check
              AND next.patient_id = t.patient_id
              AND next.admitted = t.discharged
          )
  ) AS en
    ON  st.patient_id = en.patient_id
    AND st.admitted <= en.admitted
GROUP BY
    st.patient_id,
    st.admitted ;

Query 2, which is the same as 1 but without the derived tables. This will probably have better execution plan, with proper indexes. Test in SQLFiddle-2:

SET @ward_id_to_check = 1 ;

SELECT
    st.patient_id,
    st.bed_id AS starting_bed_id,
    st.admitted,
    MIN(en.discharged) AS discharged
FROM
    t AS st    -- starting period
  JOIN
    t AS en    -- ending period
      ON  en.ward_id = @ward_id_to_check
      AND st.patient_id = en.patient_id
      AND NOT EXISTS
          ( SELECT * 
            FROM t AS next 
            WHERE next.ward_id = @ward_id_to_check
              AND next.patient_id = en.patient_id
              AND next.admitted = en.discharged
          )
      AND st.admitted <= en.admitted
WHERE 
      st.ward_id = @ward_id_to_check
  AND NOT EXISTS
      ( SELECT * 
        FROM t AS prev 
        WHERE prev.ward_id = @ward_id_to_check
          AND prev.patient_id = st.patient_id
          AND prev.discharged = st.admitted
      )
GROUP BY
    st.patient_id,
    st.admitted ;

Both queries assume that there is a unique constraint on (patient_id, admitted). If the server runs with strict ANSI settings, the bed_id should be added in the GROUP BY list.

Mysql – best design for the data storage

You could expand your activities table so that instead of a single timestamp column, it includes two timestamps, ts_from and ts_to, that indicate the time period for which the row's data is applicable. As long as no data changes, you do not insert any new rows into your table. When data does change, you insert the new data into a new row and update the old row's ts_to column.

Then you can create a calendar table that contains all timestamps of interest to you (in your case that table would contain a row for every day and every hour in the period of time that is of interest to you).

Finally, you get the result by cross-joining the two tables and keeping only those records whose date (from calendar table) falls between ts_from and ts_to.

For example (my example uses only dates and not datetimes, but you can change that easily):

create table calendar (
    d date
);

create table time_data (
    d_from date,
    d_to date,
    val1 int(10),
    val2 int(10)
);

insert into time_data values('2015-01-01', '2015-01-15', 10, 15);
insert into time_data values('2015-01-15', '2015-02-21', 7, 18);
insert into time_data values('2015-02-21', '2015-03-04', 1, 55);
insert into time_data values('2015-03-04', '2015-04-21', 3, 22);
insert into time_data values('2015-04-21', null, 9, 99);

I filled the calendar table with all dates from 2015-01-01 to now and did the simple query

select d, val1, val2 
from calendar, time_data  
where d >= d_from and d < ifnull(d_to, curdate()) 
order by d

The resultset contains an entry for every date in the range (2015-01-01, 2015-05-01) and val1, val2 from last entry from time_data before that date.

For a nice procedure to fill the calendar table, see this article.

However, since a cross join is an expensive operation, and you have large amounts of data, thoroughly test this (or any other) solution before applying it to your production system.

Best Answer

Related Solutions

Mysql – Find total duration of each consecutive series of rows

Mysql – best design for the data storage

Related Question