MySQL – Why WHERE Clause Query is Very Slow

MySQLperformance

I was running a query, the rows of the table are about 10 millions:

select * from zhihu.zhihu_answer_meta  where voteup_count>100000 limit 5;

However, it took more than 20 mins to finish.

I also tried other condition and found that it ran fast (in a second) with small number in where clause:

select * from zhihu.zhihu_answer_meta  where voteup_count>10 limit 5;
select * from zhihu.zhihu_answer_meta  where voteup_count>100 limit 5;

the DDL is as below:

CREATE TABLE `zhihu_answer_meta` (
  `answer_id` bigint(20) NOT NULL,
  `author_name` varchar(500) DEFAULT NULL,
  `reward_member_count` int(11) DEFAULT NULL,
  `answer_created` datetime DEFAULT NULL,
  `voteup_count` int(11) DEFAULT NULL,
  `is_advertiser` tinyint(1) DEFAULT NULL,
  `author_url_token` varchar(500) DEFAULT NULL,
  `comment_count` int(11) DEFAULT NULL,
  `is_org` tinyint(1) DEFAULT NULL,
  `question_type` varchar(100) DEFAULT NULL,
  `question_created` datetime DEFAULT NULL,
  `author_type` varchar(100) DEFAULT NULL,
  `insert_time` datetime DEFAULT NULL,
  `badge_num` int(11) DEFAULT NULL,
  `answer_updated` datetime DEFAULT NULL,
  `gender` tinyint(4) DEFAULT NULL,
  `reward_total_money` int(11) DEFAULT NULL,
  `author_id` varchar(300) DEFAULT NULL,
  `can_comment` tinyint(1) DEFAULT NULL,
  `question_id` bigint(20) DEFAULT NULL,
  UNIQUE KEY `answer_index` (`answer_id`,`insert_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

So why is it slow and is there any way to optimize?

Thanks for any advice.

Best Answer

You will need to define an index on the voteup_count column for the query to be performant. Without this index, this query will be scanning all the rows until it reaches the first value of the Range condition.

Assuming that the data tree is in such a way that rows with low voteup_count values come first (basically, in ascending order). In that scenario, MySQL does not need to scan a lot of rows, and stops as soon as the LIMIT is achieved.

To make the query fast in all scenarios, define the following index:

ALTER TABLE zhihu.zhihu_answer_meta ADD INDEX voteup_count_idx (voteup_count) ;

Also, note that LIMIT wihout ORDER BY is non-deterministic, as data is basically stored as an unordered set, so you can get random set of rows.

Also, another optimization tip is to avoid using SELECT *, and instead specify the column name(s) explicitly. Do read: Why is SELECT * considered harmful?