MySQL Query Optimization – Indexes & Joins 12 million+ records

MySQLperformancequery-performance

I'm running the following query:

SELECT
    `track_title`.`title`,
    `track`.`id`,
    `artist`.`name` artist,
    `album`.`title` album,
    `st_tag`.`tag`,
    `track`.`year`
FROM 
    `track`
JOIN `artist` ON `track`.`artist_id` = `artist`.`id`
JOIN `album` ON `track`.`album_id` = `album`.`id`
JOIN `track_title` ON `track`.`id` = `track_title`.`id`
JOIN `track_tags` ON `track`.`id` = `track_tags`.`track_id`
JOIN `st_tag` ON `track_tags`.`tag_id` = `st_tag`.`id`
WHERE
    `track_title`.`title_sch` LIKE "% love %"
ORDER BY
    `track_title`.`title`
LIMIT
    0,500;

Right now it takes at least 15 mins to execute. Both track and track_title have about 12 million records. Album and artist have at least a million each. Can this be sped up with better indexes? I've been playing with a combination of indexes in order to speed up the queries but i'm still having trouble getting it right.

Table structure:

CREATE TABLE `album` (
  `id` int(10) unsigned NOT NULL,
  `title` varchar(1000) default NULL,
  `title_sch` varchar(1000) default NULL,
  `flags` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `title_sch` (`title_sch`(255)),
  KEY `album_title` USING BTREE (`title`(255)),
  KEY `album_cluster` USING BTREE (`id`,`title`(255),`title_sch`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `artist` (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(1000) default NULL,
  `name_sch` varchar(1000) default NULL,
  `flags` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `name_sch` (`name_sch`(255)),
  KEY `name` USING BTREE (`name`(255)),
  KEY `artist_name` USING BTREE (`name`(255)),
  KEY `artist_cluster` USING BTREE (`id`,`name`(255),`name_sch`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `st_tag` (
  `id` smallint(5) unsigned NOT NULL auto_increment,
  `tag` varchar(50) NOT NULL,
  `genre` smallint(5) unsigned NOT NULL,
  `parent` smallint(5) unsigned default NULL,
  `depth` tinyint(3) unsigned NOT NULL default '0',
  `display` tinyint(4) NOT NULL default '-1',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=352 DEFAULT CHARSET=utf8;

CREATE TABLE `track` (
  `id` int(10) unsigned NOT NULL,
  `artist_id` int(10) unsigned NOT NULL,
  `album_id` int(10) unsigned NOT NULL,
  `status` smallint(5) unsigned NOT NULL default '0',
  `flags` smallint(5) unsigned NOT NULL default '0',
  `year` smallint(5) unsigned default NULL,
  `duration` smallint(5) unsigned NOT NULL,
  `track_no` smallint(5) unsigned default NULL,
  `disc_no` tinyint(3) unsigned default NULL,
  `explicit` tinyint(3) unsigned NOT NULL default '0',
  `popularity` tinyint(3) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `album_id` (`album_id`),
  KEY `artist_id` (`artist_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `track_tags` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `track_id` int(10) unsigned NOT NULL,
  `tag_id` smallint(5) unsigned NOT NULL,
  `status` tinyint(3) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `track_id` (`track_id`),
  KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=26661380 DEFAULT CHARSET=utf8;

CREATE TABLE `track_title` (
  `id` int(10) unsigned NOT NULL,
  `title` varchar(1000) default NULL,
  `title_sch` varchar(1000) default NULL,
  `version` varchar(100) default NULL,
  PRIMARY KEY  (`id`),
  KEY `title` (`title`(255)),
  KEY `title_sch` (`title_sch`(255)),
  KEY `title_cluster` USING BTREE (`id`,`title`(255),`title_sch`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

EDIT

The server i'm running on has about 4gbs of ram.

Querying without the first % definitely speeds up the query. It runs in 4 secs approximately. However, that doesn't give me the results i need because I need all titles with the word "love" in it not just ones that start with love. What do you guys think about this? I have a title_sch field in the track_title table. That field is a normalized version of the title where any punctuation has been removed, title has been changed to lowercase etc. Basically a field where the title has been cured. I was thinking what if i move the regular title back to the tracks table and make to track_title table a one to many relationship from track to track_title. But i would change track_title to have one word entry per word pertaining to each track. I know this will increase the table size tremendously but i then i can index track_title.title_sch and instead of using like I can use "=" right? Or when the user searches "love hurts" i can do use "IN" instead. What do you guys think? I'm open to other suggestions.

Best Answer

With the disclaimer that I'm not specifically a MySQL expert:

My guess is that either it's failing to whittle down the huge number of rows that result from such as huge join, or it can't index the title well because the LIKE starts with a wildcard.

Testing the second is easy, see if it runs better without a leading wildcard. If this is the case we can look for solutions.

For the first (and potentially the second as well), I would suggest breaking it up by doing a smaller join into a temporary table, then joining that with the other tables.

For example, if we can assume that there are foreign rows in all tables for a given track, we can SELECT, ORDER and LIMIT based on track_title alone into a temporary table. Then join in all the other tables.