Mysql – inconsistent distinct + order by behavior between thesql 5.6 and 5.7

distinctMySQLorder-by

I use MySQL 5.6 and 5.7 on Ubuntu 16.04.3 LTS and I have a problem with the following query (which is derived from drupal 7 but have simplified it in order to pinpoint the issue):

select distinct node.nid, node.title, users.name
  from node
  left join users on node.uid = users.uid
  order by node.nid desc
  limit 4;

My servers are provisioned with the exact same ansible script which uses the following my.cnf:

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address            = 0.0.0.0
key_buffer_size         = 16M
max_allowed_packet      = 1G
thread_stack            = 192K
thread_cache_size       = 8
query_cache_type        = 0
slow_launch_time=1
innodb_buffer_pool_size = 1G
innodb_file_per_table
open-files-limit=10000
log_error = /var/log/mysql/error.log
long_query_time=2
slow_query_log=on
slow_query_log_file=/var/log/mysql/mysql-slow-queries.log
general_log=off
general_log_file=/var/log/mysql/general.log
innodb_flush_log_at_trx_commit=0

I import my database and then connect to CLI using:

echo "create database drupal" | mysql -u root -ppass
mysql -u root -ppass drupal < /vagrant/2017-10-13--08-00-backup-xxxx-db.sql
mysql -u root -ppass drupal -A

On servers with 5.6 (show variables like 'version'; gives me 5.6.16-1~exp1) the query returns the following rows which are in the correct descending order (note: used some x characters to anonymize dataset):

+-------+------------------------------------------+------+
| nid   | title                                    | name |
+-------+------------------------------------------+------+
| 56217 | A Coxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   | ztex |
| 56216 | Estxxxxxxxxx                             | ctex |
| 56215 | Plexxxxxxxxxxxxxx                        | xovr |
| 56214 | Cloxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xovr |
+-------+------------------------------------------+------+

But in servers with 5.7 (show variables like 'version'; gives me 5.7.20-0ubuntu0.16.04.1-log) I get the opposite nid order from what I expect:

+-----+---------------------------+------+
| nid | title                     | name |
+-----+---------------------------+------+
|   1 | AEGQ todo - January 2009  | xc   |
|   2 | AEGQ todo - February 2009 | xc   |
|   3 | AEGQ todo - March 2009    | xc   |
|   4 | AEGQ todo - May 2009      | xc   |
+-----+---------------------------+------+

In both cases the explain extended and show warnings outputs look the same.

5.6:

mysql> explain extended select distinct node.nid, node.title, users.name from node left join users on node.uid = users.uid order by node.nid desc limit 4;
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------+-----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref             | rows | filtered   | Extra           |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------+-----------------+
|  1 | SIMPLE      | node  | index  | NULL          | PRIMARY | 4       | NULL            |    4 | 1359550.00 | Using temporary |
|  1 | SIMPLE      | users | eq_ref | PRIMARY       | PRIMARY | 4       | drupal.node.uid |    1 |     100.00 | Using where     |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------+-----------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select distinct `drupal`.`node`.`nid` AS `nid`,`drupal`.`node`.`title` AS `title`,`drupal`.`users`.`name` AS `name` from `drupal`.`node` left join `drupal`.`users` on((`drupal`.`node`.`uid` = `drupal`.`users`.`uid`)) where 1 order by `drupal`.`node`.`nid` desc limit 4
1 row in set (0.00 sec)

5.7:

mysql> explain extended select distinct node.nid, node.title, users.name from node left join users on node.uid = users.uid order by node.nid desc limit 4;
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------------+------+----------+-----------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref             | rows | filtered | Extra           |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------------+------+----------+-----------------+
|  1 | SIMPLE      | node  | NULL       | index  | NULL          | PRIMARY | 4       | NULL            |    4 |   100.00 | Using temporary |
|  1 | SIMPLE      | users | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | drupal.node.uid |    1 |   100.00 | Using where     |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------------+------+----------+-----------------+
2 rows in set, 2 warnings (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1681
Message: 'EXTENDED' is deprecated and will be removed in a future release.
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select distinct `drupal`.`node`.`nid` AS `nid`,`drupal`.`node`.`title` AS `title`,`drupal`.`users`.`name` AS `name` from `drupal`.`node` left join `drupal`.`users` on((`drupal`.`node`.`uid` = `drupal`.`users`.`uid`)) where 1 order by `drupal`.`node`.`nid` desc limit 4
2 rows in set (0.00 sec)

On both servers show variables like 'coll%'; returns the same:

+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+

And of course, on both servers the schema and data are the same:

| node  | CREATE TABLE `node` (
  `nid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `vid` int(10) unsigned DEFAULT NULL,
  `type` varchar(32) NOT NULL DEFAULT '',
  `language` varchar(12) NOT NULL DEFAULT '',
  `title` varchar(255) NOT NULL DEFAULT '',
  `uid` int(11) NOT NULL DEFAULT '0',
  `status` int(11) NOT NULL DEFAULT '1',
  `created` int(11) NOT NULL DEFAULT '0',
  `changed` int(11) NOT NULL DEFAULT '0',
  `comment` int(11) NOT NULL DEFAULT '0',
  `promote` int(11) NOT NULL DEFAULT '0',
  `sticky` int(11) NOT NULL DEFAULT '0',
  `tnid` int(10) unsigned NOT NULL DEFAULT '0',
  `translate` int(11) NOT NULL DEFAULT '0',
  `rh_action` int(11) DEFAULT NULL,
  `rh_redirect` longtext,
  `rh_redirect_response` int(11) DEFAULT NULL,
  PRIMARY KEY (`nid`),
  UNIQUE KEY `vid` (`vid`),
  KEY `node_changed` (`changed`),
  KEY `node_created` (`created`),
  KEY `node_frontpage` (`promote`,`status`,`sticky`,`created`),
  KEY `node_status_type` (`status`,`type`,`nid`),
  KEY `node_title_type` (`title`,`type`(4)),
  KEY `node_type` (`type`(4)),
  KEY `uid` (`uid`),
  KEY `tnid` (`tnid`),
  KEY `translate` (`translate`),
  KEY `language` (`language`)
) ENGINE=InnoDB AUTO_INCREMENT=56997 DEFAULT CHARSET=utf8 COMMENT='The base table for nodes.' |

and

| users | CREATE TABLE `users` (
  `uid` int(10) unsigned NOT NULL DEFAULT '0',
  `name` varchar(60) NOT NULL DEFAULT '',
  `pass` varchar(128) NOT NULL DEFAULT '',
  `mail` varchar(254) DEFAULT '',
  `theme` varchar(255) NOT NULL DEFAULT '',
  `signature` varchar(255) NOT NULL DEFAULT '',
  `signature_format` varchar(255) DEFAULT NULL,
  `created` int(11) NOT NULL DEFAULT '0',
  `access` int(11) NOT NULL DEFAULT '0',
  `login` int(11) NOT NULL DEFAULT '0',
  `status` tinyint(4) NOT NULL DEFAULT '0',
  `timezone` varchar(32) DEFAULT NULL,
  `language` varchar(12) NOT NULL DEFAULT '',
  `picture` int(11) NOT NULL DEFAULT '0',
  `init` varchar(254) DEFAULT '',
  `data` longblob,
  PRIMARY KEY (`uid`),
  UNIQUE KEY `name` (`name`),
  KEY `access` (`access`),
  KEY `created` (`created`),
  KEY `mail` (`mail`),
  KEY `picture` (`picture`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Stores user data.' |

To recap, what is the same:

  • O/S, kernel, software components and underlying virtualization method
  • /etc/mysql/my.cnf
  • imported database file (schema + data are the same)
  • database user
  • select query
  • explain select output
  • server, database and connection collations

What is different:

  • mysql version
  • sql_mode. As shown in my my.cnf I've left defaults for 5.6 and 5.7 which are:
    • 5.6: NO_ENGINE_SUBSTITUTION
    • 5.7: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

Problem:

  • order of results looks wrong on mysql 5.7

How can this be explained? Have the semantics of distinct keyword changed in MySQL 5.7?

Best Answer

We don't get to say this often but in this case:

If this happens, it's a bug.

A query that has:

SELECT
  .. DISTINCT 
  ..
ORDER BY node.nid DESC
LIMIT 4 ;

should apply the ORDER BY and LIMIT after the SELECT - whether it's a SELECT DISTINCT or not - and return the rows with the highest four nid values in descending order. In the 5.7.20 case, it seems that it doesn't and the ORDER BY is ignored (or it runs as ASC), and the rows are returned with ascending order, so it must be a bug.

The right thing to do in these cases, is to report the bug with a sample test case, to MySQL bug list.

Things you could also do - to confirm, make it easier for MySQL developers to fix, and remedy the issue until the bug is fixed - is to try:

  • remove the DISTINCT, seems useless anyway. The node.nid is unique since it's the primary key and the join is on user.uid which is the primary key of the other table, so the distinct will have no effect.
  • remove the DISTINCT and add GROUP BY node.nid, node.title, users.name or just GROUP BY node.nid, users.name.
  • remove the ONLY_FULL_GROUP_BY setting.
  • try the query in older minor versions (5.7.19, 5.7.18, ...)

And please include the findings (if the bug persists with the above tries) in your bug report to MySQL - to help them identify the cause of the problem. I suspect it's a regression due to some optimization improvement related to GROUP BY.