Mysql – Optimization of a query on a table that represents a weak entity type

indexMySQLoptimizationschema

I've been trying to figure this problem out for a while now, so I decided I'll try my luck here.
I have a very complex online shop DB structure (multiple shops in one database, hundreds of thousands of products) and I'm querying MySQL to return products from a specified category. Products are in a many-to-many (M:N) relationship to categories, so I have tables product, category and product_category (which stands for a weak entity type).

Table schemas are as follows:

CREATE TABLE `product` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8_czech_ci NOT NULL,
  `description` text COLLATE utf8_czech_ci NOT NULL,
  `date` timestamp NOT NULL DEFAULT current_timestamp(),
  `date_edited` timestamp NOT NULL DEFAULT current_timestamp(),
  `visibility` tinyint(1) NOT NULL DEFAULT 1,
  `main_slider` tinyint(1) NOT NULL DEFAULT 0,
  `bought` int(11) NOT NULL DEFAULT 0,
  `price_action` int(11) NOT NULL DEFAULT 0,
  `quantity_action` int(11) NOT NULL DEFAULT 0,
  `duration_action` timestamp NOT NULL DEFAULT current_timestamp(),
  `new` tinyint(1) NOT NULL DEFAULT 1,
  `type` int(2) NOT NULL DEFAULT 0,
  `origin_url` varchar(250) COLLATE utf8_czech_ci DEFAULT NULL,
  `origin_price` int(11) NOT NULL DEFAULT 0,
  `origin_price_dph` int(11) NOT NULL DEFAULT 0,
  `rating` int(11) NOT NULL DEFAULT 0,
  `ratingcount` int(11) NOT NULL DEFAULT 0,
  `facebook_flag` tinyint(1) DEFAULT 0,
  `salebot` tinyint(1) NOT NULL DEFAULT 0,
  `shop_supplier_id` int(11) NOT NULL,
  `shop_id` int(11) NOT NULL,
  `product_global_id` int(11) DEFAULT NULL,
  `prioritize` tinyint(1) NOT NULL DEFAULT 0,
  `free_delivery` tinyint(1) NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  UNIQUE KEY `product_global_id_2` (`product_global_id`,`shop_id`),
  KEY `shop_id` (`shop_id`),
  KEY `shop_supplier_id` (`shop_supplier_id`),
  KEY `product_global_id` (`product_global_id`),
  KEY `rating` (`rating`),
  KEY `date` (`date`),
  KEY `quantity_action` (`quantity_action`),
  KEY `duration_action` (`duration_action`),
  KEY `main_slider` (`main_slider`),
  KEY `shop_id_2` (`shop_id`,`main_slider`),
  KEY `shop_id_3` (`shop_id`,`prioritize`),
  KEY `shop_id_4` (`shop_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;

CREATE TABLE `product_category` (
  `category_id` int(11) NOT NULL,
  `product_id` int(11) NOT NULL,
  PRIMARY KEY (`category_id`,`product_id`),
  KEY `category` (`category_id`),
  KEY `product` (`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;

And the query I'm using:

SELECT product.id, 
       product.name, 
       product.shop_supplier_id, 
       product.price_action, 
       product.quantity_action, 
       product.duration_action, 
       variant.id as variant_id, 
       IF(variant.price_default IS NULL, variant_global.price_vat, variant.price_default) as price_default, 
       variant.price_dph as price, 
       variant_global.delivery_date, 
       variant.value AS variant_name, 
       IF(product.type = 0, variant_global.voc_silver, variant.voc_dph) as voc_dph, 
       shop_supplier.text_available, 
       shop_supplier.text_unavailable, 
       IF(product.type = 0, variant_global.available = 1, variant.availability = 1) as avail, 
       variant.availability_type, 
       supplier.id as supplier_id, 
       variant_global.price_vat as price_default_ds, 
       product.type 
      FROM `product` 
      LEFT JOIN shop_supplier ON (`product`.`shop_supplier_id` = `shop_supplier`.`id`) 
      LEFT JOIN supplier ON (shop_supplier.supplier_id = `supplier`.`id`)      
      LEFT JOIN variant_global ON ((SELECT id FROM variant_global WHERE variant_global.product_global_id=product.product_global_id ORDER BY `available` DESC LIMIT 1) = `variant_global`.`id`) 
      LEFT JOIN product_global ON (variant_global.product_global_id = product_global.id) 
      LEFT JOIN image ON (`product`.`id` = `image`.`product_id`) 
      LEFT JOIN variant ON ((SELECT id FROM variant WHERE `product_id`=product.id ORDER BY `availability` DESC LIMIT 1) = `variant`.`id`) 
      LEFT JOIN product_category ON (`product`.`id` = `product_category`.`product_id`) 
        WHERE (product.shop_id=100000) 
          AND (product.visibility= 1) 
          AND (image.id IS NOT NULL) 
          AND (IF(product.type = 0, variant_global.available = 1, variant.availability = 1) = (CASE WHEN display_available = 1 AND display_unavailable = 1 THEN IF(product.type = 0, variant_global.available = 1, variant.availability = 1) WHEN display_available = 1 THEN 1 ELSE 0 END)) 
          AND (variant_global.deleted IS NULL OR variant_global.deleted = 0) 
          AND (product_category.category_id= 5) 
     GROUP BY product.id 
     ORDER BY prioritize DESC, RAND() LIMIT 5;

The query is called multiple times on a single page for different categories and is really slow, I've tried all different keys and combinations and couldn't figure it out.

The EXPLAIN of the query outputs this:

+------+--------------------+------------------+--------+-----------------------------------------------+---------------------+---------+-------------------------------------+------+----------------------------------------------+
| id   | select_type        | table            | type   | possible_keys                                 | key                 | key_len | ref                                 | rows | Extra                                        |
+------+--------------------+------------------+--------+-----------------------------------------------+---------------------+---------+-------------------------------------+------+----------------------------------------------+
|    1 | PRIMARY            | product_category | ref    | PRIMARY,category,product                      | PRIMARY             | 4       | const                               | 4264 | Using index; Using temporary; Using filesort |
|    1 | PRIMARY            | product          | eq_ref | PRIMARY,shop_id,shop_id_2,shop_id_3,shop_id_4 | PRIMARY             | 4       | dropohs.product_category.product_id |    1 | Using where                                  |
|    1 | PRIMARY            | shop_supplier    | eq_ref | PRIMARY                                       | PRIMARY             | 4       | dropohs.product.shop_supplier_id    |    1 |                                              |
|    1 | PRIMARY            | supplier         | eq_ref | PRIMARY                                       | PRIMARY             | 4       | dropohs.shop_supplier.supplier_id   |    1 | Using where; Using index                     |
|    1 | PRIMARY            | variant          | eq_ref | PRIMARY                                       | PRIMARY             | 4       | func                                |    1 | Using where                                  |
|    1 | PRIMARY            | image            | ref    | PRIMARY,product_id_2,product_id               | product_id_2        | 4       | dropohs.product_category.product_id |    1 | Using where; Using index                     |
|    1 | PRIMARY            | variant_global   | eq_ref | PRIMARY                                       | PRIMARY             | 4       | func                                |    1 | Using where                                  |
|    3 | DEPENDENT SUBQUERY | variant          | ref    | product_id,product_id_2                       | product_id_2        | 4       | dropohs.product.id                  |    1 | Using where; Using index                     |
|    2 | DEPENDENT SUBQUERY | variant_global   | ref    | product_global_id,product_global_id_2         | product_global_id_2 | 4       | dropohs.product.product_global_id   |    1 | Using where; Using index                     |
+------+--------------------+------------------+--------+-----------------------------------------------+---------------------+---------+-------------------------------------+------+----------------------------------------------+

If you need any further information or schemas of the other tables used in the query, let me know, but I omited them for now as I don't think they are the cause of the problem as all of them are properly using indexes. Thanks.

Best Answer

LEFT JOIN image .. AND (image.id IS NOT NULL). If you don't want products without images, then INNER JOIN image and leave out the AND image.id ... criteria. Same with LEFT JOIN product_category, if you want a not-null category in the WHERE, don't use a LEFT JOIN, [INNER] JOIN is what you are really requesting.

Ordering with RAND() is probably also a cause of slowness.

Your subqueries on variant* availability should be things you try to use window functions for in later MySQL versions.

Try to incorporate the all categories into the query and get the results once rather than many times.

Also maybe try to cache this query result out of the database so you don't need to do it as frequently.