I was running a query, the rows of the table are about 10 millions
:
select * from zhihu.zhihu_answer_meta where voteup_count>100000 limit 5;
However, it took more than 20 mins to finish.
I also tried other condition and found that it ran fast (in a second) with small number in where clause:
select * from zhihu.zhihu_answer_meta where voteup_count>10 limit 5;
select * from zhihu.zhihu_answer_meta where voteup_count>100 limit 5;
the DDL is as below:
CREATE TABLE `zhihu_answer_meta` (
`answer_id` bigint(20) NOT NULL,
`author_name` varchar(500) DEFAULT NULL,
`reward_member_count` int(11) DEFAULT NULL,
`answer_created` datetime DEFAULT NULL,
`voteup_count` int(11) DEFAULT NULL,
`is_advertiser` tinyint(1) DEFAULT NULL,
`author_url_token` varchar(500) DEFAULT NULL,
`comment_count` int(11) DEFAULT NULL,
`is_org` tinyint(1) DEFAULT NULL,
`question_type` varchar(100) DEFAULT NULL,
`question_created` datetime DEFAULT NULL,
`author_type` varchar(100) DEFAULT NULL,
`insert_time` datetime DEFAULT NULL,
`badge_num` int(11) DEFAULT NULL,
`answer_updated` datetime DEFAULT NULL,
`gender` tinyint(4) DEFAULT NULL,
`reward_total_money` int(11) DEFAULT NULL,
`author_id` varchar(300) DEFAULT NULL,
`can_comment` tinyint(1) DEFAULT NULL,
`question_id` bigint(20) DEFAULT NULL,
UNIQUE KEY `answer_index` (`answer_id`,`insert_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
So why is it slow and is there any way to optimize?
Thanks for any advice.
Best Answer
You will need to define an index on the
voteup_count
column for the query to be performant. Without this index, this query will be scanning all the rows until it reaches the first value of the Range condition.Assuming that the data tree is in such a way that rows with low
voteup_count
values come first (basically, in ascending order). In that scenario, MySQL does not need to scan a lot of rows, and stops as soon as theLIMIT
is achieved.To make the query fast in all scenarios, define the following index:
Also, note that
LIMIT
wihoutORDER BY
is non-deterministic, as data is basically stored as an unordered set, so you can get random set of rows.Also, another optimization tip is to avoid using
SELECT *
, and instead specify the column name(s) explicitly. Do read: Why is SELECT * considered harmful?