Mysql – Subqueries run very fast individually, but when joined are very slow

join;MySQLoptimizationsubquery

ypercube solved the problem. Subqueries were totally unnecessary, and the whole thing works with plain joins. It is still strange that MySQL's optimizer could not make use of my original query, though. See below for the question and lots of details. Plus a complete solution at the bottom of my question. It is based on ypercube's answer.

Each subquery is very fast, well under 1 second.
The 5-6 subqueries are joined (some LEFT, some INNER), and the time mushrooms to 400 seconds.

The overall query I'm using for testing returns only 441 rows.

I tried putting each of the subqueries in a "CREATE TABLE" query. Each one was done in well under 1 second. Then I redid the outer query using those newly created tables, and it also ran in well under 1 second. So there is no actual problem with the joins. I put indexes on id for my created tables. All tables are joined on matching id=id.

How can I make MySQL perform the query efficiently? Must I use temporary tables? I've already written a bunch of PHP code to put together the multiple subquery joins so I would rather just figure out how to make that work, if possible.

I tried using "STRAIGHT_JOIN" keyword and removing the outer ORDER BY. That reduced query time to 90s. But I should be getting 1s max.

I tried STRAIGHT_JOIN with ORDER BY and it took 235 seconds. So it seems like the outer ORDER BY is a major performance issue.

EDIT:

Tested using temporary tables. Query runs very fast. But there has got to be a way to make mysql do it that fast with JOINS.

Also, slow query log shows:

Rows_examined: 484006914

484 million rows looks like a cartesian product. Why is it examining so many rows?

The query has this structure:

SELECT t0.`id`, t1.`length`, t2.`height`, t3.`family`
FROM
`products` t0
INNER JOIN
(
SELECT t1.`id`, t2.`value` AS `length`
FROM `products` t1
INNER JOIN `product_eav_decimal` t2
ON t1.`id` = t2.`product_id`
WHERE t2.`attribute_id` = 91
AND t2.`value` BETWEEN 15 AND 35
) t1

ON t0.`id` = t1.`id`

LEFT JOIN
(
SELECT t1.`id`, t2.`value` AS `height`
FROM `products` t1
INNER JOIN `product_eav_decimal` t2
ON t1.`id` = t2.`product_id`
WHERE t2.`attribute_id` = 80
# no other conditions
) t2
ON t0.`id` = t2.`id`

INNER JOIN
(
.
.
.
) t6
ON t0.`id` = t6.`id`
ORDER BY t0.`id` ASC

…etc LEFT JOINS are used when no other conditions in the subquery other than the attribute_id. INNER JOIN used when there is some other condition. This creates a valid search result. The query works, it just takes 400 seconds instead of 0.04.

If no one knows how to make the JOIN syntax work, then I'll use temporary tables since that seems to work.

TABLES:

1.) products

CREATE TABLE `products` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `sku` varchar(127) NOT NULL COMMENT '3char vencode + model',
 `model` varchar(127) NOT NULL,
 `vendor_id` int(11) DEFAULT NULL,
 `updated` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 UNIQUE KEY `sku` (`sku`),
 KEY `model` (`model`),
 KEY `vendor_id` (`vendor_id`),
 CONSTRAINT `FK1` FOREIGN KEY (`vendor_id`) REFERENCES `vendors` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=153282 DEFAULT CHARSET=utf8

2.) decimals

CREATE TABLE `product_eav_decimal` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `product_id` int(11) NOT NULL,
 `attribute_id` int(11) DEFAULT NULL,
 `value` decimal(11,3) DEFAULT NULL,
 `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 UNIQUE KEY `natural_key` (`product_id`,`attribute_id`,`value`),
 UNIQUE KEY `product_id_2` (`product_id`,`attribute_id`),
 KEY `last_update` (`last_update`),
 KEY `product_id` (`product_id`),
 KEY `attribute_id` (`attribute_id`),
 KEY `value` (`value`),
 CONSTRAINT `FK1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `FK2` FOREIGN KEY (`attribute_id`) REFERENCES `attributes` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=370772 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

3.) varchar (references another table, values_varchar table for actual varchar values)

CREATE TABLE `product_eav_varchar` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `product_id` int(11) DEFAULT NULL,
 `attribute_id` int(11) DEFAULT NULL,
 `value_id` int(11) DEFAULT NULL,
 `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 UNIQUE KEY `natural_key` (`product_id`,`attribute_id`,`value_id`),
 KEY `last_update` (`last_update`),
 KEY `product_id` (`product_id`),
 KEY `value_id` (`value_id`),
 KEY `attribute_id` (`attribute_id`),
 CONSTRAINT `FK1` FOREIGN KEY (`value_id`) REFERENCES `values_varchar` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `FK2` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `FK3` FOREIGN KEY (`attribute_id`) REFERENCES `attributes` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=86049 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

Adapted from ypercube's answer:

SELECT t0.id, 
       t1.`value` AS length, 
       t2.`value` AS height, 
       t3.`value` AS family,
       t5.`value` AS type
FROM
  products t0

INNER JOIN # INNER used when search criteria
# length (only searched values)
  product_eav_decimal t1
    ON  t1.product_id = t0.id  
    AND t1.attribute_id = 91
    AND t1.`value` BETWEEN 15 AND 35 # search criteria

LEFT JOIN # LEFT used when no search criteria
# height (all, including blank/null)
  product_eav_decimal t2
    ON  t2.product_id = t0.id  
    AND t2.attribute_id = 80  

LEFT JOIN  # LEFT - no search critera
# family - varchar type requires extra join to values table
  product_eav_varchar t3
    ON  t3.product_id = t0.id  
    AND t3.attribute_id = 77
LEFT JOIN # LEFT join to values table matches eav table join
values_varchar t4
    ON t3.value_id = t4.id
# search criteria would be here. see next

INNER JOIN # INNER - search criteria below
# type - varchar requires extra join, see below
  product_eav_varchar t5
    ON t5.product_id = t0.id
    AND t5.attribute_id = 76
INNER JOIN # INNER join to values table matches eav table join
values_varchar t6
    ON t5.value_id = t6.id
    # search criteria
    AND (t6.value LIKE "%sofa%" COLLATE utf8_general_ci OR t6.value LIKE "%chair%" COLLATE utf8_general_ci)

ORDER BY t0.id ASC;

The query works. It runs in a few milliseconds. If search terms or range limits are given, it returns ONLY results that match, using INNER JOINs. Where there are no criteria, it uses LEFT JOINs to returns any values (including NULL/blank).

August 2014 update – there are now 400-500 thousand rows in the products table and the query style used above still runs lightning fast. It seems that joins are far faster than subqueries in MySQL.

Best Answer

You don't need all the derived tables. You are joining the basic (product) too many times. You can write the query joining it only once.

Compound indices are a must for EAV designs. Try adding an index on (attribute_id, product_id, value) and then the query:

SELECT t0.id, 
       t1.`value` AS length, 
       t2.`value` AS height, 
       t3.`value` AS family
FROM
  products t0

INNER JOIN 
  product_eav_decimal t1
    ON  t1.product_id = t0.id  
    AND t1.attribute_id = 91
    AND t1.`value` BETWEEN 15 AND 35

LEFT JOIN
  product_eav_decimal t2
    ON  t2.product_id = t0.id  
    AND t2.attribute_id = 80  
-- 
-- 
--

LEFT JOIN                              -- LEFT or INNER join
  product_eav_decimal t6
    ON  t6.product_id = t0.id  
 -- AND t6.attribute_id = 

ORDER BY t0.id ASC ;