I've got these two tables:
books (about 2 million records in total)
Containing a unique ID (isbn) aswell as an integer value representing popularity (index, but not unique).
category_pointers (also about 2 million records)
Containing a category_id (representing "fiction" [example]) and book_id.
Now, I want to fetch all books in the category "fiction" (category_id 22) and order them by popularity. This is my query:
SELECT * FROM books JOIN (
SELECT category_pointers.isbn FROM category_pointers
RIGHT JOIN books ON books.isbn = category_pointers.isbn
WHERE category_pointers.category_id = 22
ORDER BY books.popularity DESC
LIMIT 0, 30
) AS t ON t.isbn = books.isbn
The query takes about 2 minutes to run. Does anybody know how to point me in the right direction?
Thanks!
UPDATE
Definition for books
CREATE TABLE `books` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`isbn` bigint(20) NOT NULL,
`title` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`published` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`pages` int(11) NOT NULL,
`weight` double(8,2) NOT NULL,
`type` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`added_to_index` int(11) NOT NULL DEFAULT '0',
`last_refine` int(11) NOT NULL DEFAULT '0',
`sub_title` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`author` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`author_id` int(11) NOT NULL,
`publisher` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`publisher_id` int(11) NOT NULL,
`language` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`details_url` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`image` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`popularity` int(11) NOT NULL,
`update_authority` int(11) NOT NULL DEFAULT '0',
`edition` int(11) NOT NULL DEFAULT '0',
`format` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`band_type` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`saved_image` varchar(255) COLLATE utf8_swedish_ci NOT NULL,
`has_downloaded_image` int(11) NOT NULL DEFAULT '0',
`published_year` int(11) NOT NULL DEFAULT '0',
`last_price_check` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `books_isbn_unique` (`isbn`),
KEY `books_popularity_index` (`popularity`),
KEY `books_added_to_index_index` (`added_to_index`),
KEY `books_published_year_index` (`published_year`),
KEY `books_last_price_check_index` (`last_price_check`)
) ENGINE=InnoDB AUTO_INCREMENT=1748944 DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci
And category_pointers
CREATE TABLE `category_pointers` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`isbn` bigint(20) NOT NULL,
`category_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `category_pointers_isbn_index` (`isbn`),
KEY `category_pointers_category_id_index` (`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1866535 DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci
Best Answer
I see some issues:
category_pointers
is missing a unique key on its natural primary key. Try using (category
,isbn
) as the primary key. It may be helpful to have the reverse index ('isbn', 'category'). This query can run strictly on the index.category_pointers
has a surrogate key for its primary index. I would drop it and use the natural key (category
,isbn
). I see this pattern far too frequently. A surrogate key on a join table is acode smell
to me.books_popularity_index
may work better as a compound index defined as (popularity
,isbn
). In index on (isbn
,popularity
) may help some search strategies.books
table has a valid natural keyisbn
and a surrogate keyid
. I would drop theid
column as the primary index and useisbn
as the primary key.RIGHT JOIN
rather thanINNER JOIN
overrides the choice of driving table. However, you are missing indexes which may help the query run faster.Items 1 and 3 may improve your query performance. Items 2 and 4 may be related to possible fragmentation of index ordered database tables.
I see two likely search paths:
isbn
,popularity
) index might help.Running the query prefixed by
EXPLAIN
should tell you how the data is being searched for. This may help understand why the query is slow. In most cases, each of the tables should be accessed by an index. Only the index oncategory_pointers
should be required. (This may require both indexes specified above.)