I'm trying to understand whether I'm doing something wrong, or expecting too much.
I have an InnoDB table on a db.m1.large RDS (MySQL 5.6.13. 7.5 GB RAM) with ~1M rows.
The primary key is a composite one. Having 1 TIMESTAMP and 3 BIGINTs.
pk1 is the TIMESTAMP and pk2 is a BIGINT.
I don't have any other indices, only the primary key.
Both of these queries run for ~1.2 seconds.
select e3.pk2 from table e3 where e3.pk1 BETWEEN '2013-09-24 12:27:00.0' AND '2013-11-29 11:27:00.0' group by e3.pk2 order by null;
select distinct e3.pk2 from table e3 where e3.pk1 BETWEEN '2013-09-24 12:27:00.0' AND '2013-11-29 11:27:00.0' order by null;
Running describe gives me
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | e3 | range | PRIMARY | PRIMARY | 4 | NULL | 493481 | Using where; Using index; Using temporary |
innodb_buffer_pool_size is 5.7GB.
innodb_log_buffer_size is 8M.
max_heap_table_size is 256MB.
Is the performance ok? for selecting distinct over ~500k records? Should I be considering something else?
EDIT
create statement:
CREATE TABLE `table` (
`pk1` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`pk2` int(11) NOT NULL,
`pk3` bigint(20) NOT NULL DEFAULT '0',
`pk4` bigint(20) NOT NULL DEFAULT '0',
`col1` int(11) DEFAULT NULL,
`col2` bigint(20) DEFAULT '0',
`col3` bigint(20) DEFAULT '0',
`col4` bigint(20) DEFAULT '0',
PRIMARY KEY (`pk1`,`pk2`,`pk3`,`pk4`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Profiling:
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000123 |
| checking permissions | 0.000023 |
| Opening tables | 0.000043 |
| init | 0.000052 |
| System lock | 0.000025 |
| optimizing | 0.000025 |
| statistics | 0.000139 |
| preparing | 0.000036 |
| Creating tmp table | 0.000059 |
| executing | 0.000016 |
| Sending data | 1.225919 |
| end | 0.000047 |
| removing tmp table | 0.000067 |
| end | 0.000025 |
| query end | 0.000024 |
| closing tables | 0.000033 |
| freeing items | 0.000085 |
| cleaning up | 0.000018 |
+----------------------+----------+
18 rows in set, 1 warning (0.01 sec)
EDIT 2
+----------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+---------------------------+------------------+-------------+
| Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line |
+----------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+---------------------------+------------------+-------------+
| starting | 0.000123 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL |
| checking permissions | 0.000023 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_parse.cc | 5328 |
| Opening tables | 0.000043 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 4911 |
| init | 0.000052 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_prepare_select | sql_select.cc | 1050 |
| System lock | 0.000025 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 304 |
| optimizing | 0.000025 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc | 138 |
| statistics | 0.000139 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc | 381 |
| preparing | 0.000036 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc | 500 |
| Creating tmp table | 0.000059 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | create_intermediate_table | sql_executor.cc | 207 |
| executing | 0.000016 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_executor.cc | 110 |
| Sending data | 1.225919 | 1.276080 | 0.000000 | 36 | 20 | 0 | 16 | 0 | 0 | 0 | 3 | 0 | exec | sql_executor.cc | 187 |
| end | 0.000047 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_select | sql_select.cc | 1105 |
| removing tmp table | 0.000067 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | free_tmp_table | sql_tmp_table.cc | 1852 |
| end | 0.000025 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | free_tmp_table | sql_tmp_table.cc | 1881 |
| query end | 0.000024 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5038 |
| closing tables | 0.000033 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5086 |
| freeing items | 0.000085 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_parse | sql_parse.cc | 6374 |
| cleaning up | 0.000018 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | dispatch_command | sql_parse.cc | 1764 |
+----------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+---------------------------+------------------+-------------+
EDIT 3
As requested the total number of distinct pk2 is: 318
Best Answer
Since there are few different values for
pk2
column, it might be more efficient to have an index with the reverse order(pk2, pk1)
:and then use the query:
Depending on the size of the tables and the values of the datetime range, you may get a plan that uses the primary index in the dependent subquery:
But with a larger range or larger table (we can't test too many rows at SQLFiddle), see the SQL-Fiddle, and another possible execution plan. This one I expect to be more efficient when you have few distinct values in the
pk2
column. With 318 distinct values, it will do 318 lookups using the newly added index which will be more efficient than the previous plan that first gets (possibly millions) rows that match the datetime range condition and then finds theDISTINCT
pk2 values among them: