MySQL Performance – Optimizing a Slow Query

MySQLperformance

I'm trying to optimize a ~100GB database. All data values are in one single innodb table, is this bad? Here's what a common query looks like:

SELECT `CensorCode`, `LocalDateTime`, `UTCOffset`, `DateTimeUTC`, `MethodID`, `SourceID`, `QualityControlLevelID`, `DataValue`, `OffsetTypeID`, `OffsetValue`, `d`.`SampleID`, `s`.`LabSampleCode`, `d`.`QualifierID`, `q`.`QualifierCode`
FROM (`datavalues` d)
LEFT JOIN `samples` s ON `d`.`SampleID` = `s`.`SampleID`
LEFT JOIN `qualifiers` q ON `d`.`QualifierID` = `q`.`QualifierID`
WHERE `d`.`SiteID` =  '15'
AND `d`.`VariableID` =  '8'
AND `d`.`LocalDateTime` >= '2009-04-18T00:02'
AND `d`.`LocalDateTime` <= '2009-04-20T00:02'

Here's what the explain statement on the above query looks like:

| id | select_type | table | type        | possible_keys                               | key                                         | key_len | ref               | rows    | Extra                                                                     |
+----+-------------+-------+-------------+---------------------------------------------+---------------------------------------------+---------+-------------------+---------+---------------------------------------------------------------------------+
|  1 | SIMPLE      | d     | index_merge | FK_DataValues_Variables,FK_DataValues_Sites | FK_DataValues_Sites,FK_DataValues_Variables | 4,4     | NULL              | 1390435 | Using intersect(FK_DataValues_Sites,FK_DataValues_Variables); Using where |
|  1 | SIMPLE      | s     | eq_ref      | PRIMARY                                     | PRIMARY                                     | 4       | odm.d.SampleID    |       1 |                                                                           |
|  1 | SIMPLE      | q     | eq_ref      | PRIMARY                                     | PRIMARY                                     | 4       | odm.d.QualifierID |       1 |   

Does this mean it's using indexes? If not, could someone explain how to create an index, I don't understand what it means. Any other suggestions would be greatly appreciated. Thanks in advance.

Also, here is the output from SHOW CREATE TABLE

| datavalues | CREATE TABLE `datavalues` (
  `ValueID` int(11) NOT NULL AUTO_INCREMENT,
  `DataValue` double NOT NULL,
  `ValueAccuracy` double DEFAULT NULL,
  `LocalDateTime` datetime NOT NULL,
  `UTCOffset` double NOT NULL,
  `DateTimeUTC` datetime NOT NULL,
  `SiteID` int(11) NOT NULL,
  `VariableID` int(11) NOT NULL,
  `OffsetValue` double DEFAULT NULL,
  `OffsetTypeID` int(11) DEFAULT NULL,
  `CensorCode` varchar(50) NOT NULL DEFAULT 'nc',
  `QualifierID` int(11) DEFAULT NULL,
  `MethodID` int(11) NOT NULL DEFAULT '0',
  `SourceID` int(11) NOT NULL,
  `SampleID` int(11) DEFAULT NULL,
  `DerivedFromID` int(11) DEFAULT NULL,
  `QualityControlLevelID` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ValueID`),
  UNIQUE KEY `DataValues_UNIQUE_DataValues` (`DataValue`,`ValueAccuracy`,`LocalDateTime`,`UTCOffset`,`DateTimeUTC`,`SiteID`,`VariableID`,`OffsetValue`,`OffsetTypeID`,`CensorCode`,`QualifierID`,`MethodID`,`SourceID`,`SampleID`,`DerivedFromID`,`QualityControlLevelID`),
  KEY `FK_DataValues_Sources` (`SourceID`),
  KEY `FK_DataValues_QualityControlLevels` (`QualityControlLevelID`),
  KEY `FK_DataValues_OffsetTypes` (`OffsetTypeID`),
  KEY `FK_DataValues_CensorCodeCV` (`CensorCode`),
  KEY `FK_DataValues_Variables` (`VariableID`),
  KEY `FK_DataValues_Methods` (`MethodID`),
  KEY `FK_DataValues_Qualifiers` (`QualifierID`),
  KEY `FK_DataValues_Samples` (`SampleID`),
  KEY `FK_DataValues_Sites` (`SiteID`)
) ENGINE=InnoDB AUTO_INCREMENT=282507112 DEFAULT CHARSET=utf8 |

Best Answer

Question: Does this mean the query is using indexes?

The Using intersect(FK_DataValues_Sites,FK_DataValues_Variables) means that the query is using two indexes on table datavalues with the Index Merge Intersection Access Algorithm and then (Using where) the rows are additionally checked against some other conditions.


Question: Any other suggestions? (I guess about efficiency)

This is most probably not the most efficient way, especially if the query is retrieving a small perecentage of the table's rows.

For example, lets say that the SiteID = 15 condition narrows the search to 10% of the rows and the VariableID = 8 condition narrows the search to 5% of the rows. The two existing indexes can be used invidually or combined (with tthe help of the index merge intersection algorithm) to narrow the search to (0.10 * 0.05 =) 0.5% of the tables' rows (and that is what the MySQL engine is doing in this case.)

With a 280M rows table, this means that the first index can efficiently give us the 28M keys that match the 1st condition and the second index can give us the 14M rows that match the 2nd condition. Then, the index merge algorithm can further narrow it to the 1.4M keys (at the extra cost of this algorithm). Then, the 1.4M rows have to be retrieved and the 3rd condition (LocalDateTime >= '2009-04-18T00:02' AND LocalDateTime <= '2009-04-20T00:02) - that is not covered by any index - has to be checked for every one of these 1.4M rows.

If this 3rd condition narrows the results to, say 5% or 70K rows, the overall efficiency of the query will not be the best possible.

The best would be if there was an index that covered all 3 conditions. Fortunately, there is a way. A multi-column index on either (VariableID, SiteID, LocalDateTime) or on (SiteID, VariableID, LocalDateTime) would be enough to narrow the search to only the needed keys (with the previous example to only 70K) and then only those rows would be retrieved from the table.


Extra question: Do I have to remove the current index and then replace it with a multi-column index?

No, I wouldn't remove any index, just add the new one, unless I was sure I don't need an existing index.

But analyzing queries (and indexing) should be done after examining all the queries against the database and the requirements (which queries need to be extra fast and optimized, which ones we can afford to wait longer for them, how other operations will be affected if we add a lot of indexes, etc.)