Mysql – Why using where in indexed field is slow

MySQLoptimizationperformancequery-performancewhere

we have a big table. the table structure is :

CREATE TABLE `visit_logs` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `user_id` int(10) NOT NULL,
  `ref_id` int(10) NOT NULL DEFAULT '0',
  `link_id` int(12) NOT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `day_of_week` tinyint(1) NOT NULL,
  `jalali_month` tinyint(2) NOT NULL ,
  `value` int(9) NOT NULL ,
  `ref_share` float(4,2) NOT NULL DEFAULT '0.00',
  `balance` int(9) NOT NULL ,
  `ads_id` int(12) NOT NULL,
  `ip` varchar(15) NOT NULL,
  `hash` varchar(32) NOT NULL,
  `referer` text,
  `shw` text,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`),
  KEY `jalali_month` (`jalali_month`),
  KEY `value` (`value`),
  KEY `ads_id` (`ads_id`),
  KEY `ip` (`ip`),
  KEY `hash` (`hash`),
  KEY `link_id` (`link_id`),
  KEY `day_of_week` (`day_of_week`),
  KEY `ref_share` (`ref_share`),
  KEY `ref_id` (`ref_id`),
  KEY `date` (`date`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

these are some sample queries :

mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10443393 | 200940970 |
+----------+-----------+
1 row in set (6.02 sec)

mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where `date`  > NOW() - INTERVAL 90 day;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10443354 | 200940430 |
+----------+-----------+
1 row in set (9.24 sec)

mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 23641291 | 596178719 |
+----------+-----------+
1 row in set (3.32 sec)

as you can see , selecting all rows is faster than selecting some row filtered by id , also filter by id is faster than filter by timestamp . id and timestamp fields are indexed !

Mysql version is 5.1 .
my.conf :

thread_concurrency = 8
query_cache_size = 1G
thread_cache_size = 8
myisam_sort_buffer_size = 1G
read_rnd_buffer_size = 256M
read_buffer_size = 512M
sort_buffer_size = 512M
table_open_cache = 512
max_allowed_packet = 20M
key_buffer_size = 1G
#log = /var/log/mysqlq.log
wait_timeout=120
connect_timeout=50
#max_execution_time=60000
tmp_table_size=2G
max_heap_table_size=2G

Explain :

explain SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856;
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | visit_logs | ALL  | PRIMARY       | NULL | NULL    | NULL | 12978097 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856;
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | visit_logs | range | PRIMARY       | PRIMARY | 4       | NULL | 10604250 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534567 | 202721650 |
+----------+-----------+
1 row in set (10.73 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534578 | 202721830 |
+----------+-----------+
1 row in set (10.75 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs`  where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534588 | 202722030 |
+----------+-----------+
1 row in set (3.50 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs`  where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534592 | 202722070 |
+----------+-----------+
1 row in set (3.53 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 12978727 | 262590120 |
+----------+-----------+
1 row in set (1.66 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 12978730 | 262590180 |
+----------+-----------+
1 row in set (1.65 sec)

Best Answer

Using an index requires bouncing back and forth between the index and the data.

In MyISAM, each index is a BTree sitting in the .MYI file. At the leaf node of the index is a pointer into the .MYD file. (Or, for FIXED, it will be a record number.) Your SELECTs are happy to scan linearly through the index (a BTree is efficient at that), but then for each row, it has to use the pointer to 'seek' into the .MYD file to pick up any fields that were not in the index.

Since you are fetching about half the table -- a lot of work.

Without using an index, MyISAM will scan the .MYD file (in whatever order it is in). This is more efficient than the back-and-forth of using an index.

Normally (without FORCE INDEX) the optimizer will decide to do a table scan if the estimated fraction of rows to do via an index lookup is more than somewhere around 20%. That is, usually the optimizer does the "right thing" by ignoring the index.

If this does not explain all the differences, then I suspect there were caching differences -- the data blocks (.MYD) cached in the OS and/or the index blocks (.MYI) cached in the key_buffer.

Note a difference between MyISAM and InnoDB: For MyISAM, the PRIMARY KEY is stored like any other index. For InnoDB, it is clustered with the data; so WHERE id > ... would be a range scan over the data, thereby more efficient, and you would not have seen 10sec times. (There are other significant differences, too.)