Mysql – Different behavior between MySQL 5.5 and 5.6

group byMySQLmysql-5.5mysql-5.6

I have this schema with data:

CREATE TABLE `ticket` (
  `id` INT NOT NULL,
  `name` VARCHAR(45) NULL,
  `type` INT(11) NULL,
  PRIMARY KEY (`id`));

CREATE TABLE `task` (
  `id` int(11) NOT NULL,
  `name` varchar(45) DEFAULT NULL,
  `ticket_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_task_1_idx` (`ticket_id`),
  CONSTRAINT `fk_task_1` FOREIGN KEY (`ticket_id`) REFERENCES `ticket` (`id`)
) ENGINE=InnoDB;

CREATE TABLE `planing` (
  `id` int(11) NOT NULL,
  `date` datetime DEFAULT NULL,
  `task_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_planing_1_idx` (`task_id`),
  CONSTRAINT `fk_planing_1` FOREIGN KEY (`task_id`) REFERENCES `task` (`id`)
) ENGINE=InnoDB ;

INSERT INTO `ticket` (`id`, `name`, `type`) VALUES (1, 'ticket_1', 1);
INSERT INTO `task` (`id`, `name`, `ticket_id`) VALUES (1, 'task_1', 1);
INSERT INTO `task` (`id`, `name`, `ticket_id`) VALUES (2, 'task_2', 1);
INSERT INTO `planing` (`id`, `date`, `task_id`) VALUES (1, '2016-06-01 05:00', 2);

INSERT INTO `ticket` (`id`, `name`, `type`) VALUES (2, 'ticket_2', 1);
INSERT INTO `task` (`id`, `name`, `ticket_id`) VALUES (3, 'task_3', 2);
INSERT INTO `task` (`id`, `name`, `ticket_id`) VALUES (4, 'task_4', 2);
INSERT INTO `planing` (`id`, `date`, `task_id`) VALUES (2, '2016-06-01 05:00', 3);
INSERT INTO `planing` (`id`, `date`, `task_id`) VALUES (3, '2016-07-01 05:00', 4);

INSERT INTO `ticket` (`id`, `name`, `type`) VALUES (3, 'ticket_3', 1);

When I perform a select:

SELECT ticket.id AS ticket_id, task.id AS task_id, planing.date as due_date 
FROM ticket
  LEFT JOIN task ON task.ticket_id = ticket.id
  LEFT JOIN planing ON (task.id = planing.task_id AND planing.date IS NOT NULL)
WHERE ticket.type = 1
GROUP BY ticket.id
ORDER BY planing.date ASC
;

on MySQL 5.5 I get

ticket_id | task_id | due_date
1         |1        |null   
3         |null     |null   
2         |3        |2016-06-01 05:00:00  

and on MySQL 5.6 I get

ticket_id | task_id | due_date
3         |null     |null   
1         |2        |2016-06-01 05:00:00    
2         |3        |2016-06-01 05:00:00  

example link

How to get the same behavior on MySQL 5.5 as on MySQL 5.6 (app server uses the MySQL 5.5 version)?

Best Answer

You just need to stop relying on the non-standard behaviour provided by GROUP BY when the only_full_group_by SQL mode is not enabled.

Normally, when grouping, you cannot retrieve a column that is not specified in the GROUP BY clause unless you wrap it in an aggregate function. That is because that column may have many different values (remember, we are talking about grouped rows) and there is no way to specify which value of the many should be used in the output. There is a recent extension to that behaviour according to which columns that are functionally dependent on the GROUP BY columns may be selected without specifying them in the GROUP BY, but MySQL has so far went the simple road of offering you just two modes: either you cannot retrieve non-GROUP BY columns without aggregation at all or you can retrieve any such columns, regardless of functional dependence.

So, in this case, you are retrieving columns task_id and due_date that clearly can have various values per ticket_id. Which value is ultimately selected is out of your control, and that is the problem. The solution is to define for yourself exactly which row from the other tables you want to get when you are grouping by ticket.id and implement the logic the way it can be followed unambiguously.

For instance, if you want to go by the behaviour you are observing in MySQL 5.6, you could define it as

the earliest non-NULL due_date and its corresponding task_id per ticket_id.

Based on that, you could define the logic as follows:

SELECT
  ticket.id AS ticket_id,
  (
    SELECT
      MIN(p2.task_id)
    FROM
      task AS t2
      INNER JOIN planing AS p2 ON t2.id = p2.task_id
    WHERE
      p2.date = MIN(planing.date)
      AND t2.ticket_id = ticket.id
  ) AS task_id,
  MIN(planing.date) AS due_date
FROM
  ticket 
  LEFT JOIN task ON task.ticket_id = ticket.id
  LEFT JOIN planing ON task.id = planing.task_id
WHERE
  ticket.type = 1
GROUP BY
  ticket.id
ORDER BY
  MIN(planing.date) ASC
;

This is certainly much more coding than your original query, but the extra bit of code is not redundant, because this way you are making your results predictable.

The only simpler alternative that might emulate the MySQL 5.6 behaviour could be to rewrite the query like this:

SELECT
  ticket.id AS ticket_id,
  task_id AS task_id,
  planing.date as due_date
FROM
  ticket 
  LEFT JOIN
    task
    INNER JOIN planing ON task.id = planing.task_id
  ON task.ticket_id = ticket.id
WHERE
  ticket.type = 1
GROUP BY
  ticket.id
ORDER BY
  MIN(planing.date) ASC
;

In my tests, the above query returned the same results for 5.5 as the original did for 5.6 for your example. Although the two queries should logically give equivalent results, they are not exactly equivalent: the rewritten query explicitly changes the logical order of joins. Perhaps that prompted MySQL to choose a different path through the data.

But whatever the reason for the different output, please keep in mind that there is no guarantee the behaviour will match for every scenario. I will repeat, you need to stop relying on the behaviour that is not documented, or, actually, documented to be undefined:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.