Mysql – Sample a Mysql table to increase performance

MySQLperformance

we are storing all our clients traffic data in a MySQL 5.1.73 database (MyISAM) and then we use it to generate user and internal statistics/graphs, always searchinng for a client id and a date/hour. We insert data every few minutes and then we divide the data with php in 3 tables per month using an index table to track the tables (which table use per date interval).

Currently we have this table structure, and about 2.5 million of rows per table:

    data_12_2013_3;
+---------+---------+------+-----+------------+-------+
| Field   | Type    | Null | Key | Default    | Extra |
+---------+---------+------+-----+------------+-------+
| cdate   | date    | NO   | PRI | 0000-00-00 |       |
| chour   | char(8) | NO   | PRI |            |       |
| id      | int(11) | NO   | PRI | 0          |       |
| kbps_up | int(11) | YES  |     | NULL       |       |
| kbps_dw | int(11) | YES  |     | NULL       |       |
+---------+---------+------+-----+------------+-------+


show INDEX FROM data_12_2013_3;
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table          | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| data_12_2013_3 |          0 | PRIMARY     |            1 | id          | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| data_12_2013_3 |          0 | PRIMARY     |            2 | cdate       | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| data_12_2013_3 |          0 | PRIMARY     |            3 | chour       | A         |      306679 |     NULL | NULL   |      | BTREE      |         |
| data_12_2013_3 |          1 | idcdate     |            1 | id          | A         |         104 |     NULL | NULL   |      | BTREE      |         |
| data_12_2013_3 |          1 | idcdate     |            2 | cdate       | A         |        1140 |     NULL | NULL   |      | BTREE      |         |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

The queries are done by joining the data from all the tables involved between two dates using UNION ALL, always searching for date and client. Example:

(SELECT * FROM data_07_2014_3 WHERE id=214291113 AND cdate >= '2014-07-21' AND cdate <= '2014-08-20') 
UNION ALL (SELECT * FROM data_08_2014_1 WHERE id=214291113 AND cdate >= '2014-07-21' AND cdate <= '2014-08-20')
UNION ALL (SELECT * FROM data_08_2014_2 WHERE id=214291113 AND cdate >= '2014-07-21' AND cdate <= '2014-08-20') ORDER BY cdate ASC, chour ASC;

EXPLAIN EXTENDED:

+----+--------------+------------------------+-------+---------------------------+---------+---------+------+------+----------+----------------+
| id | select_type  | table                  | type  | possible_keys             | key     | key_len | ref  | rows | filtered | Extra          |
+----+--------------+------------------------+-------+---------------------------+---------+---------+------+------+----------+----------------+
|  1 | PRIMARY      | data_07_2014_3         | range | PRIMARY,idcdate,cdate     | PRIMARY | 7       | NULL |  178 |    75.28 | Using where    |
|  2 | UNION        | data_08_2014_1         | range | PRIMARY,idcdate,cdate     | PRIMARY | 7       | NULL | 2493 |    75.01 | Using where    |
|  3 | UNION        | data_08_2014_2         | range | PRIMARY,idcdate,cdate     | PRIMARY | 7       | NULL | 2026 |    75.02 | Using where    |
| NULL | UNION RESULT | <union1,2,3>         | ALL   | NULL                      | NULL    | NULL    | NULL | NULL |     NULL | Using filesort |
+----+--------------+------------------------+-------+---------------------------+---------+---------+------+------+----------+----------------+

The problem comes with the query above, that involves one month of data, because it takes too long (about 20 seconds in some cases), because there are a lot of samples, unnecessaries in that case.

¿What could we do to improve the speed? I was thinking in sampling the data, because it's not necessary to process all of the rows to query statistics for one month. It would be better/quicker to query a sample of the measures, let's say a 5% of the rows por example. But how to do it? I can only think in creating sampled tables from the "raw" tables.

Thanks for your help

Best Answer

Performance will increase a lot lot better if you use INT UNSIGNED for cdate instead of DATE. You just have to get the UNIX_TIMESTAMP of cdate every time you insert and FROM_UNIXTIME in every query. In comparison, integer is handled by MySQL faster compared to date/string.