Optimization of Large MySQL Weather Archive Table

MySQLoptimization

I have a table that has an archive of daily weather conditions around the globe. There are around 150 million rows and the table is 1.7Gb excluding indexes.

The table is an archive so there are no writes to it. It is a MyISAM table.

Queries are showing up in my slow log as taking up to 45 seconds. These are well cached by the website, but this seems excessive.

CREATE TABLE `archive` (
  `latitude` float NOT NULL,
  `longitude` float NOT NULL,
  `condition` decimal(3,1) NOT NULL,
  `month` tinyint(4) NOT NULL DEFAULT '0',
  `day` tinyint(4) NOT NULL DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

ALTER TABLE `archive`
  ADD KEY `longitude` (`longitude`),
  ADD KEY `month` (`month`);

I have indexes on longitude and month (as I usually run a monthly average for a location). Just to note – I can't have a primary index on just longitude and latitude as these are all repeated for each day and month.

Here is an example of one of the slow queries (actually the only query type):

SELECT AVG(condition) AS average, MIN(condition) AS min_temp, MAX(condition) AS max_temp FROM `archive` WHERE `latitude`=42.75 AND `longitude`=132.75 AND `month`=4;

Server info:

64Gb RAM
key_buffer_size= 256M

Any thoughts on how I can speed things up? I have considered breaking the table up but am not sure on the best way forward.

Thanks.

Best Answer

Too slow and there is a better index? No brainer. Add the index. But... If you already have INDEX(x) drop it as you add any index starting with x. (I'm thinking of longitude and RMathis' comment.

You are likely to have trouble with things like latitude=42.75. latitude is FLOAT; 42.75 is (I think) DOUBLE. In this case, 42.75 can be represented exactly in both FLOAT and DOUBLE. But 42.76 cannot, hence it may not find a match due to roundoff.

A pair of floats consumes 8 bytes. See this for other representation choices. For weather stations that are no closer than a mile, DECIMAL(4,2)/(5,2) would save space (5 bytes instead of 8) and avoid the potential roundoff problem.

You mentioned "one of the slow queries". To best serve you, let's see all the frequent queries, even those that are not slow now. Then we can suggest a set of indexes to best handle all of them.

(When discussing performance issues, please provide SHOW CREATE TABLE; there are several details missing from what you provided.)

How much RAM do you have? What is the value of key_buffer_size? (Recommend 20% of available RAM.) If you have a tiny RAM, then, I will impart some other advice.