Mysql – Deciding which MySQL execution plan is better

execution-planMySQLperformancequery-performance

Background:

I have a query that I'm trying to speed up, involving 3 tables:

  • omgenvelope has 5 million rows
  • omgcust has 195 rows
  • omginput has 35836 rows

I'm obtaining the distinct omginputs referenced by certain omgenvelopes and grabbing some data from omgcust belonging to the omginput.

My two query variants so far:

SELECT customer, custname, idomginput, filename, omginput.laststamp
 FROM omginput, omgcust
 WHERE idomginput IN
     (SELECT DISTINCT(lastinput) FROM omgenvelope WHERE envstate NOT IN (42,46,65,70,250))
   AND idomgcust=customer ORDER BY omginput.laststamp, filename;

vs.

SELECT DISTINCT customer, custname, idomginput, filename, omginput.laststamp
  FROM omgenvelope
  JOIN omginput ON (idomginput=lastinput AND envstate NOT IN (42,46,65,70,250))
  JOIN omgcust ON (idomgcust=customer) ORDER BY omginput.laststamp, filename;

Query plans:

EXPLAIN on the upper query, with the IN (SELECT...) subquery, shows this plan (abbreviated):

  select_type: PRIMARY
        table: omgcust
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 195
        Extra: Using temporary; Using filesort
--------------
  select_type: PRIMARY
        table: omginput
         type: ref
          key: fk_omginput_omgcust1_idx
      key_len: 4
          ref: tracksdb.omgcust.idomgcust
         rows: 109
        Extra: Using where
--------------
  select_type: DEPENDENT SUBQUERY
        table: omgenvelope
         type: index_subquery
          key: fk_omgenvelope_omginput1_idx
      key_len: 4
          ref: func
         rows: 867
        Extra: Using where

An the new query that I'm, considering to use instead:

  select_type: SIMPLE
        table: omgenvelope
         type: range
          key: fk_omgenvelope_omgstate1_idx
      key_len: 4
          ref: NULL
         rows: 886220
        Extra: Using where; Using temporary; Using filesort
--------------
  select_type: SIMPLE
        table: omginput
         type: eq_ref
          key: PRIMARY
      key_len: 4
          ref: tracksdb.omgenvelope.lastinput
         rows: 1
--------------
  select_type: SIMPLE
        table: omgcust
         type: eq_ref
          key: PRIMARY
      key_len: 4
          ref: tracksdb.omginput.customer
         rows: 1

Question:

How do I quantify which one is more efficient?
The old one with the DEPENDENT SUBQUERY looks pretty simple, and doesn't yield so many rows in each step (195 * 109 * 867). The new candidate on the other hand shows (886220 * 1 * 1) rows.

So my question is how to interpret these estimated numbers of rows.
Can I just multiply them and compare the products, or do I need to think more about what the RDBMS is actually doing when executing the queries?

Best Answer

Multiplying the rows is invalid for several reasons:

  • Many times, the rows examined are an approximation (based on statistics, not accurate), good for query plan selection, but not for performance calculation
  • The total number of rows examined on a nested loop join (A, B) is not rows_examined_on_table_A * rows_examined_on_table_B, but rows_examined_on_table_A + rows_returned_from_table_A * rows_examined_on_table_B. Where clauses can make a huge difference on that, although it is true that the mentioned calculations is many times used as a broad approximation, assuming the indexes are being created properly and the main causes of filtering out results.
  • Modern MySQL versions do not use always a nested loop join approach for executing joins and subqueries. Check 5.6 subquery optimizations and other optimization documents on the same manual. Additionally, some of the new optimization techniques do not modify the predicted examined rows, which at some times can be way lower than the one printed, even if it has been calculated exactly.

In particular, on your first query, you are hitting a well know MySQL bug? limitation? in which an IN subquery is identified as a DEPENDENT SUBQUERY, even if it really isn't, forcing the outmost query to be executed without an index (full table scan) in order to test all possible values of the first table. That is usually an indicator that it is a bad query. It seems not to bee too bad in this case, as the table is small, but it is usually an indication of bad performance.

The other thing that should bring your attention is the Using temporary; Using filesort. Filtering is not the only thing where you should focus, as these extra pieces of information are telling you that a large sorting has to be done using a temporary table (that may or may not end up on disk, but at least has to be materialized). That is another indicator of potential bad performance, that in some cases can be avoided with the right indexes.

I will not tell you which is the right query to use (partially, because I do not know all the variables: indexes, tables structure, etc., and in most cases it will depend on the particular hardware/resources available), but I will tell you the tools to decide:

  • Profile the query- obtain the post execution times and how much of it it is being invested in what. You can use SHOW PROFILES up to 5.5, and the performance_schema starting with 5.6.

  • As time can be sometimes variable (for example, depending on other queries being executed at the same time, depending on the buffer pool contents) Obtain post-execution statistics with SHOW SESSION STATUS. In particular:

    FLUSH SESSION STATUS;
    SELECT ... ;
    SHOW STATUS like 'Hand%';
    

    will give you the exact number of handler calls done (approximately, the number of rows read and written for that particular query- although that is not 100% accurate, as it depends on the particular engine implementation).

    You may also want to monitor other status variables, like the created temporary tables, created temporary tables on disk and sort passes/sorted rows.

    All of these will give you post-execution, exact, time-independent parameters to evaluate the performance of a query. Percona even has a patch for the slow log to output that information on the logs instead of using performance_schema.

With those extra pieces of information you will be able to evaluate more objectively which query is better, and not relying exclusively on EXPLAIN, which only provides limited pre-execution information.