Database with insert data every 2 minutes

database-designdatabase-theory

I have a site to develop into PHP and mysql where I take, with a cron, every two minutes 10 xml from a server parse each and insert data into my database.
Well this data are simple string (maximum 10) and about 20 field (string(10)).
Every two minutes I insert 10 records with 20 fields (the number of xml can be incremented).

Into this database I have other tables but the biggest table is where I insert every two minutes some records.

I want to know if is bettere to separate database by year like: database_2013, database_2014… create database automatically when the year change beacuse I dn't want that this table of data can generate every year Gb of database and ten years from now I have a very big database where to make query is very slow.

One database (isn't a problem a lot of Gb for a database) or more database divided by years?

This is my table that I implement:

CREATE TABLE IF NOT EXISTS `station` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `station_id` int(11) unsigned NOT NULL,
  `dewpoint` varchar(5) NOT NULL,
  `heat_index` varchar(10) NOT NULL,
  `pressure` varchar(10) NOT NULL,
  `relative_humidity` varchar(5) NOT NULL,
  `temperature` varchar(5) NOT NULL,
  `wind_degrees` varchar(10) NOT NULL,
  `wind_direction` varchar(10) NOT NULL,
  `wind_speed` varchar(5) NOT NULL,
  `windchill` varchar(10) NOT NULL,
  `rain_rate_in_per_hour` varchar(10) NOT NULL,
  `rain_rate_hour_high_in_per_hour` varchar(10) NOT NULL,
  `rain_storm_in` varchar(10) NOT NULL,
  `leaf_wetness` varchar(2) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

Best Answer

Inserting 10 rows each 2 minutes will result in (24*60/2)*10 rows per day (7200) -> that is not a large value to worry about. Also, it's great that you think on the "future - 10 years", but don't lose time with premature optimization.

If your only concern it's about this table where you're inserting data every 2 mins, there's no point in creating additional databases (one per year), so let's stick with this table (Table_A).

Now, because this table will slowly increase in time, and your want your queries to run fast, you have plenty of options:

  1. you can partition the table by certain criteria. Because your using the table for a "weather station", and your data is time series values, your best option would be to partition by [station_id] then by [created]

  2. Create a Table_A_Archive, where you can move data that would be to old to keep in Table_A.

  3. How long do you intend to keep data in Table_A ? would it make sense to delete old rows that become obsolete for you application

... And so on.

Best Bet: Partitioning the existing table by [station_id] and [created], you will have "A" partitions for each station, "B" partitions for each month, and a total of AxB posible number of partitions. Once you partition Table_A, do the same thing for Table_A_Archive, and on the end of each year, move the data from Table_A to Table_A_Archive.

** IMPORTANT:** After you make the partitioning schema, keep in mind that all queries should have in the WHERE clause the conditions necesarly so that the query will hit as little partitions as posible.

Ex.

Select AVG(pressure)
FROM Table_A
WHERE station_id = 123
AND created between '2013-01-01 00:00:00' AND '2013-02-01 00:00:00'

.

The query above will only touch the partition "123" and the 2 sub partitions for months 1 and 2

Related Question