MySQL SELECT slow, but only 2 x 300K rows and indexes

join;MySQLperformanceselect

Have the MySQL SELECT query below that is awfully slow.

It takes ~1.0 seconds to execute despite having only 300K rows and indexes, so I would love to find a way to get it to execute faster since it's a query that needs to be run again and again.

The query:

SELECT p.id, p.image, c.name, s.name, MIN(p.saleprice)
FROM products p 
JOIN shops s ON p.shopid = s.id 
JOIN products_category pc ON p.id = pc.product_id 
JOIN categories c ON pc.category_id = c.id
WHERE p.brand_id > 0
AND pc.category_id = 46
AND pc.active = 1
AND p.price > 0
AND p.saleprice > 0
AND p.saleprice < p.price
AND (p.last_seen > DATE_SUB(NOW(), INTERVAL 2 DAY))
GROUP BY p.image

The query returns 960 rows.

The table products has 300.000 rows and these columns + one index:

CREATE TABLE `products` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(512) DEFAULT NULL,
  `shopid` int(11) DEFAULT NULL,
  `image` varchar(512) DEFAULT NULL,
  `price` int(11) DEFAULT NULL,
  `saleprice` int(11) DEFAULT NULL,
  `last_seen` datetime DEFAULT NULL,
  `brand_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `brand_id etc` (`brand_id`,`shopid`,`last_seen`,`price`,`saleprice`)
) ENGINE=InnoDB AUTO_INCREMENT=391671 DEFAULT CHARSET=utf8

The table products_categories also has 300.000 rows and these columns + two indexes:

CREATE TABLE `products_category` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `product_id` int(11) DEFAULT NULL,
  `category_id` int(11) DEFAULT NULL,
  `active` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `category_id etc` (`category_id`,`active`),
  KEY `product_id etc` (`product_id`,`active`)
) ENGINE=InnoDB AUTO_INCREMENT=373364 DEFAULT CHARSET=utf8

The shops table. It has 15 rows and these columns:

CREATE TABLE `shops` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(256) DEFAULT NULL,
  `active` int(11) DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8

Based on similar questions here, I have tried nesting things with an inner select:

SELECT p.id, p.image, c.name, s.name, MIN(p.saleprice)
FROM 
(SELECT * FROM products WHERE p.brand_id > 0 AND price > 0 AND saleprice > 0 AND saleprice < price AND (p.last_seen > DATE_SUB(NOW(), INTERVAL 3 DAY))) p 
JOIN shops s ON p.shopid = s.id 
JOIN products_category pc ON p.id = pc.product_id 
JOIN categories c ON pc.category_id = c.id 
WHERE pc.category_id = 46
AND pc.active = 1
GROUP BY p.image

It didn't help. The version with the inner select takes ~1,3 seconds to execute.

The problem seems to be the join between products and products_category, i.e. the two big tables with 300K rows each.

The GROUP BY p.image is necessary (some products are just different sizes, so we use p.image to filter these out).

Maybe there's a trick I can do with my indexes? Or can any of you spot something else I should optimize?

EXPLAIN of the query:

id  select_type table   partitions  type    possible_keys                   key             key_len ref             rows    filtered    Extra
1   SIMPLE      c       \N          const   PRIMARY                         PRIMARY         4       const           1       100.00      Using temporary; Using filesort
1   SIMPLE      pc      \N          ref     category_id etc,product_id etc  category_id etc 10      const,const     43104   100.00      Using where
1   SIMPLE      p       \N          eq_ref  PRIMARY,brand_id etc            PRIMARY         4       pc.product_id   1       5.00        Using where
1   SIMPLE      s       \N          eq_ref  PRIMARY                         PRIMARY         4       p.shopid        1       100.00      \N

Best Answer

KEY brand_id etc (brand_id,shopid,last_seen,price,saleprice)

This index does not support this query well.

The first field in the index is brand_id, but you are only filtering by "> 0".
Since I doubt you have many brands with negative id's, this probably isn't going to help very much.

The next field in the index is shopid, which doesn't appear in your where clause at all.

Based on this cursory analysis (and MySQL will do a lot more than that), this index will probably be discounted and the table scanned serially instead.

A better discriminator for choosing the rows you want returned would be last_seen. Consider adding an index on that (or with that as its first field) and you should see some improvement.