Mysql – Really slow join, limit and order_by query

join;MySQL

I've got these two tables:

books (about 2 million records in total)

Containing a unique ID (isbn) aswell as an integer value representing popularity (index, but not unique).

category_pointers (also about 2 million records)

Containing a category_id (representing "fiction" [example]) and book_id.

Now, I want to fetch all books in the category "fiction" (category_id 22) and order them by popularity. This is my query:

SELECT * FROM books JOIN (
  SELECT category_pointers.isbn FROM category_pointers 
  RIGHT JOIN books ON books.isbn = category_pointers.isbn 
  WHERE category_pointers.category_id = 22
  ORDER  BY books.popularity DESC
  LIMIT  0, 30
) AS t ON t.isbn = books.isbn

The query takes about 2 minutes to run. Does anybody know how to point me in the right direction?

Thanks!

UPDATE
Definition for books

CREATE TABLE `books` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `isbn` bigint(20) NOT NULL,
  `title` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `published` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `pages` int(11) NOT NULL,
  `weight` double(8,2) NOT NULL,
  `type` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `added_to_index` int(11) NOT NULL DEFAULT '0',
  `last_refine` int(11) NOT NULL DEFAULT '0',
  `sub_title` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `author` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `author_id` int(11) NOT NULL,
  `publisher` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `publisher_id` int(11) NOT NULL,
  `language` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `details_url` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `image` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `popularity` int(11) NOT NULL,
  `update_authority` int(11) NOT NULL DEFAULT '0',
  `edition` int(11) NOT NULL DEFAULT '0',
  `format` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `band_type` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `saved_image` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
  `has_downloaded_image` int(11) NOT NULL DEFAULT '0',
  `published_year` int(11) NOT NULL DEFAULT '0',
  `last_price_check` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `books_isbn_unique` (`isbn`),
  KEY `books_popularity_index` (`popularity`),
  KEY `books_added_to_index_index` (`added_to_index`),
  KEY `books_published_year_index` (`published_year`),
  KEY `books_last_price_check_index` (`last_price_check`)
) ENGINE=InnoDB AUTO_INCREMENT=1748944 DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci

And category_pointers

CREATE TABLE `category_pointers` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `isbn` bigint(20) NOT NULL,
  `category_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY `category_pointers_isbn_index` (`isbn`),
  KEY `category_pointers_category_id_index` (`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1866535 DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci

Best Answer

I see some issues:

  1. category_pointers is missing a unique key on its natural primary key. Try using (category, isbn) as the primary key. It may be helpful to have the reverse index ('isbn', 'category'). This query can run strictly on the index.
  2. category_pointers has a surrogate key for its primary index. I would drop it and use the natural key (category, isbn). I see this pattern far too frequently. A surrogate key on a join table is a code smell to me.
  3. The books_popularity_index may work better as a compound index defined as (popularity, isbn). In index on (isbn, popularity) may help some search strategies.
  4. The books table has a valid natural key isbn and a surrogate key id. I would drop the id column as the primary index and use isbn as the primary key.
  5. Use of RIGHT JOIN rather than INNER JOIN overrides the choice of driving table. However, you are missing indexes which may help the query run faster.

Items 1 and 3 may improve your query performance. Items 2 and 4 may be related to possible fragmentation of index ordered database tables.

I see two likely search paths:

  • small category: category - isbn - popularity. This requires a sort on populary after all books in the category are found. A (isbn, popularity) index might help.
  • large category: popularity - isbn - category. Finds first thirty matching records without requiring a sort. A large unpopular category may perform slowly.

Running the query prefixed by EXPLAIN should tell you how the data is being searched for. This may help understand why the query is slow. In most cases, each of the tables should be accessed by an index. Only the index on category_pointers should be required. (This may require both indexes specified above.)