I've been trying to figure this problem out for a while now, so I decided I'll try my luck here.
I have a very complex online shop DB structure (multiple shops in one database, hundreds of thousands of products) and I'm querying MySQL to return products from a specified category. Products are in a many-to-many (M:N) relationship to categories, so I have tables product
, category
and product_category
(which stands for a weak entity type).
Table schemas are as follows:
CREATE TABLE `product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_czech_ci NOT NULL,
`description` text COLLATE utf8_czech_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`date_edited` timestamp NOT NULL DEFAULT current_timestamp(),
`visibility` tinyint(1) NOT NULL DEFAULT 1,
`main_slider` tinyint(1) NOT NULL DEFAULT 0,
`bought` int(11) NOT NULL DEFAULT 0,
`price_action` int(11) NOT NULL DEFAULT 0,
`quantity_action` int(11) NOT NULL DEFAULT 0,
`duration_action` timestamp NOT NULL DEFAULT current_timestamp(),
`new` tinyint(1) NOT NULL DEFAULT 1,
`type` int(2) NOT NULL DEFAULT 0,
`origin_url` varchar(250) COLLATE utf8_czech_ci DEFAULT NULL,
`origin_price` int(11) NOT NULL DEFAULT 0,
`origin_price_dph` int(11) NOT NULL DEFAULT 0,
`rating` int(11) NOT NULL DEFAULT 0,
`ratingcount` int(11) NOT NULL DEFAULT 0,
`facebook_flag` tinyint(1) DEFAULT 0,
`salebot` tinyint(1) NOT NULL DEFAULT 0,
`shop_supplier_id` int(11) NOT NULL,
`shop_id` int(11) NOT NULL,
`product_global_id` int(11) DEFAULT NULL,
`prioritize` tinyint(1) NOT NULL DEFAULT 0,
`free_delivery` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY `product_global_id_2` (`product_global_id`,`shop_id`),
KEY `shop_id` (`shop_id`),
KEY `shop_supplier_id` (`shop_supplier_id`),
KEY `product_global_id` (`product_global_id`),
KEY `rating` (`rating`),
KEY `date` (`date`),
KEY `quantity_action` (`quantity_action`),
KEY `duration_action` (`duration_action`),
KEY `main_slider` (`main_slider`),
KEY `shop_id_2` (`shop_id`,`main_slider`),
KEY `shop_id_3` (`shop_id`,`prioritize`),
KEY `shop_id_4` (`shop_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;
CREATE TABLE `product_category` (
`category_id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
PRIMARY KEY (`category_id`,`product_id`),
KEY `category` (`category_id`),
KEY `product` (`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;
And the query I'm using:
SELECT product.id,
product.name,
product.shop_supplier_id,
product.price_action,
product.quantity_action,
product.duration_action,
variant.id as variant_id,
IF(variant.price_default IS NULL, variant_global.price_vat, variant.price_default) as price_default,
variant.price_dph as price,
variant_global.delivery_date,
variant.value AS variant_name,
IF(product.type = 0, variant_global.voc_silver, variant.voc_dph) as voc_dph,
shop_supplier.text_available,
shop_supplier.text_unavailable,
IF(product.type = 0, variant_global.available = 1, variant.availability = 1) as avail,
variant.availability_type,
supplier.id as supplier_id,
variant_global.price_vat as price_default_ds,
product.type
FROM `product`
LEFT JOIN shop_supplier ON (`product`.`shop_supplier_id` = `shop_supplier`.`id`)
LEFT JOIN supplier ON (shop_supplier.supplier_id = `supplier`.`id`)
LEFT JOIN variant_global ON ((SELECT id FROM variant_global WHERE variant_global.product_global_id=product.product_global_id ORDER BY `available` DESC LIMIT 1) = `variant_global`.`id`)
LEFT JOIN product_global ON (variant_global.product_global_id = product_global.id)
LEFT JOIN image ON (`product`.`id` = `image`.`product_id`)
LEFT JOIN variant ON ((SELECT id FROM variant WHERE `product_id`=product.id ORDER BY `availability` DESC LIMIT 1) = `variant`.`id`)
LEFT JOIN product_category ON (`product`.`id` = `product_category`.`product_id`)
WHERE (product.shop_id=100000)
AND (product.visibility= 1)
AND (image.id IS NOT NULL)
AND (IF(product.type = 0, variant_global.available = 1, variant.availability = 1) = (CASE WHEN display_available = 1 AND display_unavailable = 1 THEN IF(product.type = 0, variant_global.available = 1, variant.availability = 1) WHEN display_available = 1 THEN 1 ELSE 0 END))
AND (variant_global.deleted IS NULL OR variant_global.deleted = 0)
AND (product_category.category_id= 5)
GROUP BY product.id
ORDER BY prioritize DESC, RAND() LIMIT 5;
The query is called multiple times on a single page for different categories and is really slow, I've tried all different keys and combinations and couldn't figure it out.
The EXPLAIN of the query outputs this:
+------+--------------------+------------------+--------+-----------------------------------------------+---------------------+---------+-------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------------+--------+-----------------------------------------------+---------------------+---------+-------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | product_category | ref | PRIMARY,category,product | PRIMARY | 4 | const | 4264 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | product | eq_ref | PRIMARY,shop_id,shop_id_2,shop_id_3,shop_id_4 | PRIMARY | 4 | dropohs.product_category.product_id | 1 | Using where |
| 1 | PRIMARY | shop_supplier | eq_ref | PRIMARY | PRIMARY | 4 | dropohs.product.shop_supplier_id | 1 | |
| 1 | PRIMARY | supplier | eq_ref | PRIMARY | PRIMARY | 4 | dropohs.shop_supplier.supplier_id | 1 | Using where; Using index |
| 1 | PRIMARY | variant | eq_ref | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
| 1 | PRIMARY | image | ref | PRIMARY,product_id_2,product_id | product_id_2 | 4 | dropohs.product_category.product_id | 1 | Using where; Using index |
| 1 | PRIMARY | variant_global | eq_ref | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | variant | ref | product_id,product_id_2 | product_id_2 | 4 | dropohs.product.id | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | variant_global | ref | product_global_id,product_global_id_2 | product_global_id_2 | 4 | dropohs.product.product_global_id | 1 | Using where; Using index |
+------+--------------------+------------------+--------+-----------------------------------------------+---------------------+---------+-------------------------------------+------+----------------------------------------------+
If you need any further information or schemas of the other tables used in the query, let me know, but I omited them for now as I don't think they are the cause of the problem as all of them are properly using indexes. Thanks.
Best Answer
LEFT JOIN image
..AND (image.id IS NOT NULL)
. If you don't want products without images, thenINNER JOIN image
and leave out theAND image.id ...
criteria. Same withLEFT JOIN product_category
, if you want a not-null category in theWHERE
, don't use aLEFT JOIN
,[INNER] JOIN
is what you are really requesting.Ordering with
RAND()
is probably also a cause of slowness.Your subqueries on
variant*
availability
should be things you try to use window functions for in later MySQL versions.Try to incorporate the all categories into the query and get the results once rather than many times.
Also maybe try to cache this query result out of the database so you don't need to do it as frequently.