DB: Amazon RDS MySQL (OS: Linux, 2 vCPU, Memory: 8GB)
I have a table with almost 14M rows of data.
CREATE TABLE `meterreadings` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`meterid` varchar(16) DEFAULT NULL,
`metervalue` int(11) DEFAULT NULL,
`date_time` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `meterid` (`meterid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
As you can see, I use an index on meterid.
Another table which stores device IDs (around 100 rows of data)
CREATE TABLE `devices` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`meterid` varchar(16) DEFAULT NULL,
`location` varchar(8) DEFAULT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `meterid_UNIQUE` (`meterid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
To get 15 minute aggregated data, I use the below query
SELECT AVG(metervalue) as value
, DATE_FORMAT(date_time, "%d %b %Y %H:%i") as label
FROM meterreadings
WHERE meterid IN (SELECT meterid from devices)
AND date_time BETWEEN '2018-07-23' AND '2018-07-24'
GROUP BY DATE(date_time), HOUR(date_time), MINUTE(date_time) DIV 15
ORDER BY date_time ASC;
Query performance is very bad – It takes approximately around 12 seconds to execute, and causes a temporary spike in DB server usage as well.
EXPLAIN on this query returned this:
1 SIMPLE devices index meterid_UNIQUE meterid_UNIQUE 19 125
Using where; Using index; Using temporary; Using filesort
1 SIMPLE meterreadings ref meterid meterid 19 devices.meterid 322
Using where
I dropped the index on meterreadings and surprisingly the query performance is better – almost about 6 seconds now. I am still wondering why?
EXPLAIN on the query after dropping the index
1 SIMPLE meterreadings ALL 14580167 Using where;
Using temporary; Using filesort
1 SIMPLE devices ref meterid_UNIQUE meterid_UNIQUE 19
meterreadings.meterid 1 Using index
I am currently doing my query operation on the table without index – Is there a way I can optimize the table / query to do the operation faster (like a composite index on two columns?)
[The table is growing approximately by around 40 rows per second]
Best Answer
You should play with it a bit, because it might not be clear beforehand what solution will produce the best results.
A few points to consider
It is very likely that an index on the date column will provide you with better selectivity, as it has a higher selectivity.
Composite indexes are usually a good idea, but please make sure to chose the order correctly
date_time, meterid
vs.meterid, date_time
. In most cases it makes more sense to leave columns with dense values (i.e dates, floats) to the end, as any column in the index following them is unlikely to have any effect. ( trymeterid, date_time
for an index.)Subselects might force the optimizer to use a specific plan. Try converting it into a join if possible.