Mysql – Optimizing a simple query on a large table

MySQLmysql-5.5optimization

DB: MySQL 5.5.20 (WampServer, default configuration)
OS: Win 7
HDD: Western Digital 3TB Caviar Green, 3.5", IntelliPower, 64MB, Sata3 (WD30EZRX)
Memory: 8GB
MySQL my.ini: http://pastie.org/private/go9kaxlmlvirati2txbaa

Query in question:

SELECT name.id AS name_id, name.name, cast_info.id,
cast_info.role_id, cast_info.movie_id
FROM cast_info
LEFT JOIN name ON name.id = cast_info.person_id
WHERE cast_info.movie_id = 1000000
ORDER BY cast_info.movie_id ASC

It fetches all people who worked on a given movie. Problem is it can take anywhere from 0.1s to nearly 2.0s. That's too long. When the user needs to run it 10k times they might as well uninstall the application. Even I haven't had the patience to wait for it to finish.

edit: The time it takes to run the query is determined by the number of people who worked on it. Roughly 0.1s for each 10 people.

EXPLAIN:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cast_info
type: ref
possible_keys: idx_mid,mpi
key: idx_mid
key_len: 4
ref: const
rows: 15
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: name
type: eq_ref
possible_keys: PRIMARY,id_name_idx
key: PRIMARY
key_len: 4
ref: imdb.cast_info.person_id
rows: 1
Extra:

Tables:

CREATE TABLE cast_info (
id int(11) NOT NULL AUTO_INCREMENT,
person_id int(11) NOT NULL,
movie_id int(11) NOT NULL,
person_role_id int(11) DEFAULT NULL,
note text,
nr_order int(11) DEFAULT NULL,
role_id int(11) NOT NULL,
PRIMARY KEY (id),
KEY idx_pid (person_id),
KEY idx_mid (movie_id),
KEY idx_cid (person_role_id),
KEY cast_info_role_id_exists (role_id),
KEY mpi (movie_id,person_id,id)
) ENGINE=MyISAM AUTO_INCREMENT=33261692 DEFAULT CHARSET=utf8

CREATE TABLE name (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(110) NOT NULL,
imdb_index varchar(12) DEFAULT NULL,
imdb_id int(11) DEFAULT NULL,
gender varchar(1) DEFAULT NULL,
name_pcode_cf varchar(5) DEFAULT NULL,
name_pcode_nf varchar(5) DEFAULT NULL,
surname_pcode varchar(5) DEFAULT NULL,
md5sum varchar(32) DEFAULT NULL,
PRIMARY KEY (id),
KEY idx_name (name(6)),
KEY idx_imdb_id (imdb_id),
KEY idx_pcodecf (name_pcode_cf),
KEY idx_pcodenf (name_pcode_nf),
KEY idx_pcode (surname_pcode),
KEY idx_md5 (md5sum),
KEY id_name_idx (id,name)
) ENGINE=MyISAM AUTO_INCREMENT=4287972 DEFAULT CHARSET=utf8

Thanks!

edit: MyISAM is used because this is a local database, used by one local application, by one user. Only one query is executed at the same time. Also because IMDbPy takes at least a month to build the database with InnoDB…

edit: Query EXPLAIN after converting to InnoDB:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cast_info
type: ref
possible_keys: mpi
key: mpi
key_len: 3
ref: const
rows: 23
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: name
type: eq_ref
possible_keys: PRIMARY,id_name_idx
key: PRIMARY
key_len: 4
ref: imdb.cast_info.person_id
rows: 1
Extra:

Best Answer

It would run faster with InnoDB,

  • The PRIMARY KEY is 'clustered'; this makes the lookup into name faster.

  • A suitable setting of innodb_buffer_pool_size of, say, 5G (for your 8GB machine) would cache a lot of stuff, thereby minimizing the I/O for 10K operations.

You could also shrink the data (and decrease the I/O) by using MEDIUMINT UNSIGNED or SMALLINT UNSIGNED where appropriate.

There are redundant indexes that could (should) be removed; this would speed up the load. Also name(6) is probably useless.