MySQL Performance – Indexes Not Used When Joining Tables

indexjoin;MySQLorder-byperformance

I have two tables, the first table contains all articles / blog posts within a CMS. Some of these articles may also appear in a magazine, in which case they have a foreign key relationship with another table that contains magazine specific information.

Here is a simplified version of the create table syntax for these two tables with some non-essential rows stripped out:

CREATE TABLE `base_article` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `date_published` datetime DEFAULT NULL,
  `title` varchar(255) NOT NULL,
  `description` text,
  `content` longtext,
  `is_published` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `base_article_date_published` (`date_published`),
  KEY `base_article_is_published` (`is_published`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `mag_article` (
    `basearticle_ptr_id` int(11) NOT NULL,
    `issue_slug` varchar(8) DEFAULT NULL,
    `rubric` varchar(75) DEFAULT NULL,
    PRIMARY KEY (`basearticle_ptr_id`),
    KEY `mag_article_issue_slug` (`issue_slug`),
    CONSTRAINT `basearticle_ptr_id_refs_id` FOREIGN KEY (`basearticle_ptr_id`) REFERENCES `base_article` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The CMS contains around 250,000 articles total and I have written a simple Python script that can be used to populate a test database with sample data if they want to replicate this issue locally.

If I select from one of these tables, MySQL has no problem picking an appropriate index or retrieving articles quickly. However, when the two tables are joined together in a simple query such as:

SELECT * FROM `base_article` 
INNER JOIN `mag_article` ON (`mag_article`.`basearticle_ptr_id` = `base_article`.`id`)
WHERE is_published = 1
ORDER BY `base_article`.`date_published` DESC
LIMIT 30

MySQL fails to pick an appropriate index and performance plummets. Here is the relevant explain extended (the execution time for which is over a second):

+----+-------------+--------------+--------+-----------------------------------+---------+---------+----------------------------------------+-------+----------+---------------------------------+
| id | select_type |    table     |  type  |           possible_keys           |   key   | key_len |                  ref                   | rows  | filtered |              Extra              |
+----+-------------+--------------+--------+-----------------------------------+---------+---------+----------------------------------------+-------+----------+---------------------------------+
|  1 | SIMPLE      | mag_article  | ALL    | PRIMARY                           | NULL    | NULL    | NULL                                   | 23830 | 100.00   | Using temporary; Using filesort |
|  1 | SIMPLE      | base_article | eq_ref | PRIMARY,base_article_is_published | PRIMARY | 4       | my_test.mag_article.basearticle_ptr_id |     1 | 100.00   | Using where                     |
+----+-------------+--------------+--------+-----------------------------------+---------+---------+----------------------------------------+-------+----------+---------------------------------+
  • EDIT SEPT 30: I can remove the WHERE clause from this query, but the EXPLAIN still looks the same and the query is still slow.

One potential solution is to force an index. Running the same query with FORCE INDEX (base_articel_date_published) results in a query that executes in around 1.6 milliseconds.

+----+-------------+--------------+--------+---------------+-----------------------------+---------+-------------------------+------+-----------+-------------+
| id | select_type |    table     |  type  | possible_keys |             key             | key_len |           ref           | rows | filtered  |    Extra    |
+----+-------------+--------------+--------+---------------+-----------------------------+---------+-------------------------+------+-----------+-------------+
|  1 | SIMPLE      | base_article | index  | NULL          | base_article_date_published |       9 | NULL                    |   30 | 833396.69 | Using where |
|  1 | SIMPLE      | mag_article  | eq_ref | PRIMARY       | PRIMARY                     |       4 | my_test.base_article.id |    1 | 100.00    |             |
+----+-------------+--------------+--------+---------------+-----------------------------+---------+-------------------------+------+-----------+-------------+

I would prefer not to have to force an index on this query if I can avoid it, for several reasons. Most notably, this basic query can be filtered / modified in a variety of ways (such as filtering by the issue_slug) after which base_article_date_published may no longer be the best index to use.

Can anyone suggest a strategy for improving performance for this query?

Best Answer

What about this this should remove the need for an "Using temporary; Using filesort" because the data is in the right sort already.

You need to know the trick why MySQL needs "Using temporary; Using filesort" to remove that need.

See second sqlfriddle for an explain about removing the need

SELECT
      *
    FROM base_article

    STRAIGHT_JOIN 
      mag_article
    ON
      (mag_article.basearticle_ptr_id = base_article.id)

    WHERE
      base_article.is_published = 1

    ORDER BY
      base_article.date_published DESC

see http://sqlfiddle.com/#!2/302710/2

Works pretty good i needed this also some time ago for Country / city tables see demo here with example data http://sqlfiddle.com/#!2/b34870/41

Edited you also may want to analyse this answer if base_article.is_published = 1 always returns 1 record like your explain explained an INNER JOIN deliverd table may give better performance like the queries in the answer below

https://stackoverflow.com/questions/18738483/mysql-slow-query-using-filesort/18774937#18774937