Mysql – Optimize MySQL Table for Charting

datetimeindexMySQLtimestamp

Our application is collecting data points at different rates (between 500ms – 10,000ms) depending on user settings and that data is queried to plot points on a chart.

I started with the following table structure:

CREATE TABLE `monitored_parameter_data_378` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `command_id` int(11) DEFAULT NULL,
  `data_value` varchar(255) DEFAULT NULL,
  `system_unit` varchar(255) DEFAULT NULL,
  `unit` varchar(255) DEFAULT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Once we collected over 20,000 points (which will obviously keep growing) performance slowed down quite a bit. It is taking between 950-1,100ms to get results back so I changed the schema to the following in hopes of benefitting from having the timestamp column indexed:

CREATE TABLE `monitored_parameter_data_378_test` (
  `command_id` int(11) DEFAULT NULL,
  `data_value` varchar(255) DEFAULT NULL,
  `system_unit` varchar(255) DEFAULT NULL,
  `unit` varchar(255) DEFAULT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  KEY `CREATED_IDX` (`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Unfortunately I am not seeing any improvement in response time after making the change.

For testing and establishing a baseline I am using the following query:

SELECT * FROM <table_name> WHERE created BETWEEN '2017-01-01' AND '2017-03-13';

The indexed version of the table takes between 850ms-1,100ms to return so I am unsure of how best to proceed. When I run EXPLAIN on the queries I see that the index is being used and that 50% of the data is filtered which is much better than the 11.11% in the non-indexed version.

Does anyone have any insight on how I might take performance to the next level given the fairly simple structure and approach to pulling the data?

Note: the machine in question is Raspberry Pi

Update: I marked @a_vlad's answer as accepted yesterday because based on the original question it is correct. That said, I rewrote the logic to be done in 3 steps, first I get back the first/last point within the date range along with count of total points, second I calculate the interval between those points to based on the number of points the user wants to plot, last I query for the data by doing the following:

'SELECT * FROM monitored_parameter_data_378 WHERE id >= <variable containing first point in range> AND id <= <variable containing last point in range> AND id MOD <variable containing interval> = 0';

This gets the data back onto a chart in ~4 seconds in a table with 1,000,000 records.

Best Answer

First of all, better to combine both of your structures:

  • and have PK
  • and have index for "created"
  • as a variant, think about a natural PK - created + something which uniquely identifies a row
  • added - also think about units (system_unit) - if it is identification of your point(?), it may be a good idea to change it to INT format, it reduces memory and disk space usage for this column.

PK could be good for some other queries, and InnoDB still creates it even if you do not define it, but you can not use it in this case.

More importantly, response time depends not only on indexes (of course it MUST HAVE), but also from number of records. You can change the query to:

SELECT count(*) FROM <table_name> WHERE created BETWEEN '2017-01-01' AND '2017-03-13';

and it will show you - how many records the server must return.

The only few ways for feature improvements:

  • reduce timeframe
  • reduce number of columns - but this is not your case
  • faster server - SSD (if still not), increase memory for InnoDB buffer at least (if it is less than necessary)