Mysql – Optimize a query with two range conditions

MySQLoptimizationperformancequery-performancesubquery

I have a structure similar to this one:

CREATE TABLE `author` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;

CREATE TABLE `book` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `author_id` int(11) unsigned NOT NULL,
  `org` int(11) unsigned NOT NULL,
  `country` char(3) NOT NULL,
  `publish_date` date NOT NULL,
  `price` decimal(6,2) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `author_id` (`author_id`),
  KEY `publish_date` (`publish_date`),
  KEY `i0` (`country`, `org`, `author_id`, `price`, `publish_date`),
  KEY `i1` (`country`, `org`, `author_id`, `publish_date`, `price`)
) ENGINE=InnoDB CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;

Here is a SQL Fiddle

So I'm trying to execute a query with two range conditions

SELECT
  id as author_id,
  (SELECT COUNT(DISTINCT `book`.`id`)+1 
   FROM `book` 
   WHERE 
      `book`.`org` = 1
      AND `book`.`country` = 'USA' 
      AND `book`.`publish_date` BETWEEN '2010-04-30' AND '2011-04-30'
      AND `book`.`author_id` = `author`.`id`
      AND `book`.`price` < 50
  ) AS `books_under_fifty`
FROM `author` 
ORDER BY books_under_fifty desc;

but the optimizer uses only part of my index: const,const,db_9_6349e2.author.id from i0

is there a way to optimize it?

Best Answer

Well, let's think out of the box. It seems that there are two sets of results -- the authors with some such books, and those without. The first case is more efficiently done via:

SELECT  author_id,
        COUNT(*)+1 AS books_under_fifty
    FROM  `book`
    WHERE  `org` = 1
      AND  `country` = 'USA'
      AND  publish_date >= '2010-05-01'
      AND  publish_date <  '2010-05-01' + INTERVAL 1 YEAR
      AND  `price` < 50

That would probably be best served by

INDEX(country, org, publish_date, price, author_id)

If you are happy with the performance of that, then you can think out of the box in finding "the rest of the authors".

Back to the 2-ranges problem. That can sometimes be solved by Partitioning. Not knowing the distribution of the data (is one year a small subset of the total? or what about price?), I can't say which would be better:

PARTITION BY RANGE(TO_DAYS(publish_date))

Partitioning by price is out, since DECIMAL can't be used. Storing price as number of cents and using PARTITION BY RANGE(cents) would work, but clumsily.

Then make 20-50 partitions of the partition key. This would give you "partition pruning" for one 'range', but then the index needs to be tweaked for the other. Assuming you partition by publish_date:

INDEX(country, org, price)
PRIMARY KEY(id, publish_date)

Meanwhile, ...

Don't use utf8 if you are using standard country_codes; use ascii, at least for that column.

What is org does it distinguish the book? Or are there some missing columns? Is there no other "unique" column(s)?

Another variant that might be better:

 PRIMARY KEY(country, org, price,  -- for clustering
             publish_date,   -- because partitioning requires it
             id)           -- to assure uniqueness
 INDEX(id)

This variant assumes you have no 'natural' PK. And it will be more efficient because it clusters the desired rows together. I did not include author_id in the PK since it is readily available in the row.

add index

ADD INDEX ac(author_id, country) tricked it into using i0. Go figure. Note: I left (author_id) intact and first in the list of indexes; so it is not just the order of the indexes.

I'm using Percona's 5.6.22-71.0-log .

Note that there are only 4 different country values, evenly distributed. But there are many different author_id, making it by itself more "selective".

Related Solutions

Mysql – Are two indexes needed

An index can seek by a subset of characters, as long as you're searching from the left. E.g., "Inter%" can seek, "%net" will not.

However, the first character is not necessarily the character under which the article would be sorted. "The Internet" should go under "I", not "T". You probably need two fields, DisplayTitle and SortTitle; a single-character index on the latter may be worthwhile, but most likely a full-length index will be just fine.

Indexes are typically B-trees, and a seek will jump to the right location about equally quickly whether you have 10 or 100 entries per page. Scans are another matter, but I'd start with the simplest solution and add an extra index only if performance proves inadequate in practice.

Mysql – Finding rows for a specified date range

Another way to get the result is this. It finds first all groups that the teacher has surely taught (or is going to) by checking that she has started within the month and then in another subquery it finds - for every group - the last teacher that started at the first day of the month or earlier.

With the unique index you have on the table, the second subquery should be quite efficient. The first subquery would benefit from an index on (teacherid, startdate, groupid):

SELECT groupid
FROM pupilgroupteacher
WHERE teacherid = @teacher 
  AND startdate >= @month + INTERVAL 1 DAY
  AND startdate < @month + INTERVAL 1 MONTH

UNION DISTINCT

SELECT gg.groupid
FROM 
    ( SELECT DISTINCT groupid
      FROM pupilgroupteacher
    ) AS gd
  JOIN pupilgroupteacher AS gg
    ON  gg.groupid = gd.groupid
    AND gg.startdate =
        ( SELECT MAX(gi.startdate)
          FROM pupilgroupteacher AS gi 
          WHERE gi.groupid = gd.groupid
            AND gi.startdate < @month + INTERVAL 1 DAY
        )
WHERE gg.teacherid = @teacher ;

Best Answer

Related Solutions

Mysql – Are two indexes needed

Mysql – Finding rows for a specified date range

Related Question