MySQL – How Derived Tables are Implemented

derived-tablesjoin;MySQLsubquery

When MySQL makes a derived table as a result of

SELECT (SELECT ...) etc
or SELECT * FROM a JOIN (SELECT * from B) etc
or SELECT * FROM (SELECT ...)

Are these derived temporary tables created in-memory or the file-system? Does it depend on the size of the derived table?

Best Answer

This is tricky question, specially since 5.6, in which the optimizer has changed the way subqueries are executed. Specially, having into account that the definition of "derived table" may change person to person

From 4.1 to 5.5, in many cases, either a temporary table would be created or a cross join will be done between tables, filtering later. In some cases, like the infamous SELECT ... IN (SELECT ...) a derived table will not be created, and it will be executed as a correlated subquery (one row at a time). See the examples here: http://forums.mysql.com/read.php?115,576862,576999#msg-576999

On that same thread, Roy Liseng comment the great subquery improvements of 5.6: http://forums.mysql.com/read.php?115,576862,577215#msg-577215 You can see a summary of the improvements on the 5.6 subquery optimization on the manual, including the creation of indexes on materialization, or its conversion into semi-joins. More details on some of the Oracle engineer blogs.

I think that I had to mention this because there are a lot of things going on on the optimizer, and not always a temporary table is created first "as is".

Regarding the other question, MySQL always tries to create internal temporary tables on MEMORY first, but failsback to disk if they are too large (larger than min(tmp_table_size, max_heap_table_size) or some other restrictions (like having BLOBS).

Related Solutions

Mysql – Subqueries run very fast individually, but when joined are very slow

You don't need all the derived tables. You are joining the basic (product) too many times. You can write the query joining it only once.

Compound indices are a must for EAV designs. Try adding an index on (attribute_id, product_id, value) and then the query:

SELECT t0.id, 
       t1.`value` AS length, 
       t2.`value` AS height, 
       t3.`value` AS family
FROM
  products t0

INNER JOIN 
  product_eav_decimal t1
    ON  t1.product_id = t0.id  
    AND t1.attribute_id = 91
    AND t1.`value` BETWEEN 15 AND 35

LEFT JOIN
  product_eav_decimal t2
    ON  t2.product_id = t0.id  
    AND t2.attribute_id = 80  
-- 
-- 
--

LEFT JOIN                              -- LEFT or INNER join
  product_eav_decimal t6
    ON  t6.product_id = t0.id  
 -- AND t6.attribute_id = 

ORDER BY t0.id ASC ;

MySQL Best Practices – Using Unions or Derived Tables

In your example query, neither way is particularly better, because the inner queries have their own where clauses, are generating a very small result-set, and are able to do this very efficiently presumably because the invoice column in pending and done are indexed.

If you were doing something more complex, and especially without such highly-selective where clauses in the inter queries, the union would likely perform better... because in the derived table scenario, the derived table is materialized before the joins to the outer tables and indexes on the tables used in the inner query can't be used to handle the join... on the other hand, with UNION, the optimizer could use indexes on pending/done if they were appropriate.

Personally, I'd be inclined to solve this one by writing one or more stored procedures to encapsulate the union queries and return the answer as a result set using an unbounded SELECT.

MySQL supports a very useful extension that enables the use of regular SELECT statements (that is, without using cursors or local variables) inside a stored procedure. The result set of such a query is simply sent directly to the client.

^{http://dev.mysql.com/doc/refman/5.5/en/stored-routines-syntax.html}

These are called "unbounded." So, a stored procedure that takes the invoice id as an input parameter, does the SELECT ... UNION ALL ... SELECT can encapsulate everything and give you a very clean way to ask the database for what you need and get a record-set in return.

CALL get_invoice_customer_location_detail_by_invoice_id(123456); # long, but much more appealing

If you subsequently decide union or derived really was The One True Way after all, then you only have to update that in one place, in the procedure definition.

Free tip: UNION means UNION DISTINCT in non-dinosaur versions of MySQL, while some older versions will interpret it as UNION ALL. It's probably good practice to explicitly use the one you actually want, so you always get the behavior you expect. It's also generally good from a performance standpoint to use UNION ALL unless you need UNION DISTINCT because it's one less thing for the optimizer to have to deal with, but this is much more the case with large result sets, where a lot of time could be spent de-duplicating a set of data that is already free of duplicates.

Best Answer

Related Solutions

Mysql – Subqueries run very fast individually, but when joined are very slow

MySQL Best Practices – Using Unions or Derived Tables

Related Question