Mysql – MariaDB 5.5 vs MySQL 5.7 query optimizers

mariadbMySQLoptimizationquery-performance

I have a query that optimizes differently in MariaDB 5.5.37 and
MySQL 5.0 and MySQL 5.7. When poorly optimized it takes about 400
seconds to run, but on MySQL 5.7 which optimizes it differently,
it runs in about 3 seconds.

I've tried a variety of different versions of MySQL and MariaDB, and
just discovered that MySQL 5.7 optimizes it differently from the others
I have tried.

The query is generated by the Webtrees geneology software.

I'm wondering if anyone here can tease out what causes the different
optimizations and whether it is possible to reorder the query to
force a better query plan.

Here is the table definition:

CREATE TABLE `wt_name` (
  `n_file` int(11) NOT NULL,
  `n_id` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `n_num` int(11) NOT NULL,
  `n_type` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
  `n_sort` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `n_full` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `n_surname` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `n_surn` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `n_givn` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `n_soundex_givn_std` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `n_soundex_surn_std` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `n_soundex_givn_dm` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `n_soundex_surn_dm` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`n_id`,`n_file`,`n_num`),
  KEY `ix1` (`n_full`,`n_id`,`n_file`),
  KEY `ix2` (`n_surn`,`n_file`,`n_type`,`n_id`),
  KEY `ix3` (`n_givn`,`n_file`,`n_type`,`n_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

The table has 186,123 rows. Identical data loaded in both databases.

Here is the version info for tha MariaDB 5.5 and MySQL 5.7 servers:

MariaDB> SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| innodb_version          | 5.5.37-MariaDB-34.0 |
| protocol_version        | 10                  |
| version                 | 5.5.37-MariaDB-log  |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | Linux               |
+-------------------------+---------------------+

mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 5.7.4                        |
| protocol_version        | 10                           |
| version                 | 5.7.4-m14                    |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+

The query in question (copied from the slow query log, with some indentation
added for clarity):

SELECT SQL_CACHE n2.n_surn, n1.n_surname, n1.n_id
  FROM `wt_name` n1  JOIN
    (SELECT n_surn, n_file
       FROM `wt_name`
      WHERE n_file=4 AND n_type!='_MARNM' AND
            n_surn NOT IN ('', '@N.N.')
     GROUP BY n_surn COLLATE 'utf8_unicode_ci', n_file
    ) n2
    ON (n1.n_surn=n2.n_surn COLLATE 'utf8_unicode_ci' AND n1.n_file=n2.n_file)
    AND n_type!='_MARNM'

Here is the the explain output on MariaDB (the slow one, typical
query time 400s):

+------+-------------+------------+-------+---------------+------+---------+----
---------------------+--------+-------------------------------------------------
----------+
| id   | select_type | table      | type  | possible_keys | key  | key_len | ref
                     | rows   | Extra
          |
+------+-------------+------------+-------+---------------+------+---------+----
---------------------+--------+-------------------------------------------------
----------+
|    1 | PRIMARY     | n1         | ALL   | ix2           | NULL | NULL    | NUL
L                    | 184314 | Using where
          |
|    1 | PRIMARY     | <derived2> | ref   | key0          | key0 | 4       | hgr
c_webtrees.n1.n_file |     10 | Using where
          |
|    2 | DERIVED     | wt_name    | range | ix2           | ix2  | 768     | NUL
L                    |  92159 | Using where; Using index; Using temporary; Using
 filesort |
+------+-------------+------------+-------+---------------+------+---------+----
---------------------+--------+-------------------------------------------------
----------+

Here is the explain output on MySQL 5.7 (the fast one, typical
query time under 3s):

+----+-------------+------------+------------+-------+---------------+------+---
------+----------------+-------+----------+-------------------------------------
----------------------+
| id | select_type | table      | partitions | type  | possible_keys | key  | ke
y_len | ref            | rows  | filtered | Extra
                      |
+----+-------------+------------+------------+-------+---------------+------+---
------+----------------+-------+----------+-------------------------------------
----------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL   | NULL          | NULL | NU
LL    | NULL           | 94067 |   100.00 | NULL
                      |
|  1 | PRIMARY     | n1         | NULL       | ref   | ix2           | ix2  | 77
2     | func,n2.n_file |    19 |   100.00 | Using index condition
                      |
|  2 | DERIVED     | wt_name    | NULL       | range | ix2           | ix2  | 76
8     | NULL           | 94067 |   100.00 | Using where; Using index; Using temp
orary; Using filesort |
+----+-------------+------------+------------+-------+---------------+------+---
------+----------------+-------+----------+-------------------------------------
----------------------+

Here's a description of purpose of the query:

"This query is generated by the "Individual list" page. The page
allows users to list all the individuals in a family tree who have
a particular surname, or initial. It does include an option to list
all surnames or all individuals."

I've tried to provide all of the relevant and useful info. Please let me
know if I've left out anything important. I'd be very grateful if anyone has
any pointers to improving this query.

Thanks,

Mark

Best Answer

First, I would try an index on (n_file, n_surn, n_type, n_id, n_surname) for this query. You may have to change the GROUP BY clause for the index to be used effectively by all versions of the optimizer:

GROUP BY n_file, n_surn COLLATE 'utf8_unicode_ci'

Second, why the derived table, the group by and the join? I think the query is equivalent to the much simpler:

SELECT SQL_CACHE 
    n2.n_surn, n1.n_surname, n1.n_id
FROM wt_name n1 
WHERE n_file = 4 
  AND n_surn NOT IN ('', '@N.N.')
  AND n_type <> '_MARNM' ;