Mysql – Query monthly having group_concat and group by need help

group bygroup-concatenationMySQLpivot

I’m stuck on a join tables query presenting data monthly involving GROUP BY and GROUP_CONCAT.

Here’s a simple client table (DDL and DML at the bottom of this post):

id | Name
1  | Sony
2  | Toshiba
3  | Apple
4  | LG
5  | Uco

Then the event table

id | client_id | date_start
1 | 1 | 2017-01-12 18:44:42
2 | 1 | 2017-01-13 18:44:42
3 | 1 | 2017-01-14 18:44:42
4 | 1 | 2017-02-12 18:44:42
5 | 1 | 2017-03-12 18:44:42
6 | 1 | 2017-07-12 18:44:42
7 | 2 | 2017-02-12 18:44:42
8 | 2 | 2017-03-12 18:44:42
9 | 2 | 2017-04-12 18:44:42
10 | 3 | 2017-01-12 18:44:42
11 | 3 | 2017-01-14 18:44:42
12 | 3 | 2017-01-20 18:44:42
13 | 3 | 2017-03-12 18:44:42
14 | 3 | 2017-05-12 18:44:42
15 | 3 | 2017-06-12 18:44:42
16 | 4 | 2017-07-12 18:44:42
17 | 4 | 2017-07-20 18:44:42
18 | 5 | 2017-09-12 18:44:42
19 | 5 | 2017-10-12 18:44:42
20 | 5 | 2017-03-12 18:44:42

The desired result is as follows. The string number, for example (10-01-12) on Jan/Apple is formatted as id-month-day.

desired outcome

What I’ve done so far is using case when to split result monthly:

select * from (
select e.id, c.name as client,
(CASE WHEN MONTH(e.date_start) = 1 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as jan,
(CASE WHEN MONTH(e.date_start) = 2 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as feb,
(CASE WHEN MONTH(e.date_start) = 3 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as mar,
(CASE WHEN MONTH(e.date_start) = 4 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as apr,
(CASE WHEN MONTH(e.date_start) = 5 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as may,
(CASE WHEN MONTH(e.date_start) = 6 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as jun,
(CASE WHEN MONTH(e.date_start) = 7 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as jul,
(CASE WHEN MONTH(e.date_start) = 8 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as aug,
(CASE WHEN MONTH(e.date_start) = 9 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as sep,
(CASE WHEN MONTH(e.date_start) = 10 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as oct,
(CASE WHEN MONTH(e.date_start) = 11 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as nov,
(CASE WHEN MONTH(e.date_start) = 12 then GROUP_CONCAT(CONCAT(e.id,'-',LPAD(month(date_start),2,'0'), '-', LPAD(day(date_start),2,'0')) SEPARATOR ',')END) as `dec` 
  
from 
 event as e 
 left join client as c on c.id=e.client_id 
group by month(date_start),client
order by client
) t 

result need work

But query above need final touch to group by client. How do I concat having group by client result as appear on the desired table above with comma as separator?

The second part is counting the sum of each data monthly. Not as important, I really need to get part one to work.

Here is the SQL for data and tables.

CREATE TABLE `client` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

LOCK TABLES `client` WRITE;

INSERT INTO `client` (`id`, `name`)
VALUES
    (1,'Sony'),
    (2,'Toshiba'),
    (3,'Apple'),
    (4,'LG'),
    (5,'Uco');


UNLOCK TABLES;

CREATE TABLE `event` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `client_id` int(11) unsigned DEFAULT NULL,
  `date_start` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY `client_id` (`client_id`),
  KEY `date_start` (`date_start`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

LOCK TABLES `event` WRITE;


INSERT INTO `event` (`id`, `client_id`, `date_start`)
VALUES
    (1,1,'2017-01-12 18:44:42'),
    (2,1,'2017-01-13 18:44:42'),
    (3,1,'2017-01-14 18:44:42'),
    (4,1,'2017-02-12 18:44:42'),
    (5,1,'2017-03-12 18:44:42'),
    (6,1,'2017-07-12 18:44:42'),
    (7,2,'2017-02-12 18:44:42'),
    (8,2,'2017-03-12 18:44:42'),
    (9,2,'2017-04-12 18:44:42'),
    (10,3,'2017-01-12 18:44:42'),
    (11,3,'2017-01-14 18:44:42'),
    (12,3,'2017-01-20 18:44:42'),
    (13,3,'2017-03-12 18:44:42'),
    (14,3,'2017-05-12 18:44:42'),
    (15,3,'2017-06-12 18:44:42'),
    (16,4,'2017-07-12 18:44:42'),
    (17,4,'2017-07-20 18:44:42'),
    (18,5,'2017-09-12 18:44:42'),
    (19,5,'2017-10-12 18:44:42'),
    (20,5,'2017-03-12 18:44:42');

UNLOCK TABLES;

Best Answer

Terminology and Methodology

This kind of transformation – rows to columns – is called pivoting. It is typical to pivot data simultaneously with their aggregation, as seems to be a requirement in your case too. In SQL you can do both operations as a single logical step. Other SQL products even offer special syntactical extensions for pivoting, but there is a way to do that using more generic syntax, one that is supported by at least every major RDBMS, which includes MySQL.

The method is called conditional aggregation, and you almost nailed it. The conditional, implemented as a CASE expression in your query, is supposed to go inside the aggregate function, and the criterion that the conditional is checking on (MONTH(e.date_start) in your case) needs to be excluded from the GROUP BY.

So, instead of

SELECT
  CASE WHEN MONTH(e.date_start) = 1 THEN GROUP_CONCAT(...),
  ...
FROM
  ...
GROUP BY
  MONTH(e.date_start),
  client

it should be

SELECT
  GROUP_CONCAT(CASE WHEN MONTH(e.date_start) = 1 THEN ...),
  ...
FROM
  ...
GROUP BY
  MONTH(e.date_start),
  client

The exclusion part may seem counter-intuitive – after all, you are intending to get monthly data. However, you should keep in mind that in SQL you are grouping rows. In your case, one row is one client – therefore, grouping should be by client only. You could say that monthly grouping is implicit, as it is implemented through conditional aggregation only.

Anyway, what about the last row? The last row is special, and not just because it is a rollup row, thus representing aggregated data over the entire set. In my view, it is special more because it contains entirely different data: counts instead of concatenated strings.

Based on that fact, it seems natural to me to consider a distinct logical step – a separate SELECT – for getting the last row's results. The two result sets would then be combined into one with the help of a UNION ALL operator. This approach would make the logic clear, in my opinion: different kinds of data in the output would be accounted for by different legs of the query. And clear logic ultimately means ease of maintenance.

Solution

So, taking into account all of the above, the complete query could look like this:

SELECT
  c.name AS client,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  1 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS jan,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  2 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS feb,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  3 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS mar,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  4 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS apr,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  5 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS may,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  6 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS jun,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  7 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS jul,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  8 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS aug,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN  9 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS sep,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN 10 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS oct,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN 11 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS nov,
  GROUP_CONCAT(CASE MONTH(e.date_start) WHEN 12 THEN CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) END SEPARATOR ',') AS `dec`
FROM
  event AS e
  INNER JOIN client AS c ON e.client_id = c.id
GROUP BY
  c.name

UNION ALL

SELECT
  NULL,
  COUNT(MONTH(e.date_start) =  1 OR NULL),
  COUNT(MONTH(e.date_start) =  2 OR NULL),
  COUNT(MONTH(e.date_start) =  3 OR NULL),
  COUNT(MONTH(e.date_start) =  4 OR NULL),
  COUNT(MONTH(e.date_start) =  5 OR NULL),
  COUNT(MONTH(e.date_start) =  6 OR NULL),
  COUNT(MONTH(e.date_start) =  7 OR NULL),
  COUNT(MONTH(e.date_start) =  8 OR NULL),
  COUNT(MONTH(e.date_start) =  9 OR NULL),
  COUNT(MONTH(e.date_start) = 10 OR NULL),
  COUNT(MONTH(e.date_start) = 11 OR NULL),
  COUNT(MONTH(e.date_start) = 12 OR NULL)
FROM
  event AS e

;

Or, perhaps, like this, if we wanted to make it look slightly less cumbersome by eliminating repetition of some code:

SELECT
  client,
  GROUP_CONCAT(CASE month WHEN  1 THEN item END SEPARATOR ',') AS jan,
  GROUP_CONCAT(CASE month WHEN  2 THEN item END SEPARATOR ',') AS feb,
  GROUP_CONCAT(CASE month WHEN  3 THEN item END SEPARATOR ',') AS mar,
  GROUP_CONCAT(CASE month WHEN  4 THEN item END SEPARATOR ',') AS apr,
  GROUP_CONCAT(CASE month WHEN  5 THEN item END SEPARATOR ',') AS may,
  GROUP_CONCAT(CASE month WHEN  6 THEN item END SEPARATOR ',') AS jun,
  GROUP_CONCAT(CASE month WHEN  7 THEN item END SEPARATOR ',') AS jul,
  GROUP_CONCAT(CASE month WHEN  8 THEN item END SEPARATOR ',') AS aug,
  GROUP_CONCAT(CASE month WHEN  9 THEN item END SEPARATOR ',') AS sep,
  GROUP_CONCAT(CASE month WHEN 10 THEN item END SEPARATOR ',') AS oct,
  GROUP_CONCAT(CASE month WHEN 11 THEN item END SEPARATOR ',') AS nov,
  GROUP_CONCAT(CASE month WHEN 12 THEN item END SEPARATOR ',') AS `dec`
FROM
  (
    SELECT
      c.name AS client,
      MONTH(e.date_start) AS month,
      CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) AS item
    FROM
      event AS e
      INNER JOIN client AS c ON e.client_id = c.id
  ) AS derived
GROUP BY
  client

UNION ALL

SELECT
  NULL,
  COUNT(month =  1 OR NULL),
  COUNT(month =  2 OR NULL),
  COUNT(month =  3 OR NULL),
  COUNT(month =  4 OR NULL),
  COUNT(month =  5 OR NULL),
  COUNT(month =  6 OR NULL),
  COUNT(month =  7 OR NULL),
  COUNT(month =  8 OR NULL),
  COUNT(month =  9 OR NULL),
  COUNT(month = 10 OR NULL),
  COUNT(month = 11 OR NULL),
  COUNT(month = 12 OR NULL)
FROM
  (
    SELECT
      MONTH(e.date_start) AS month
    FROM
      event AS e
  ) AS derived

;

If you are not very familiar with the A = B OR NULL formula, think of it as an alternative to CASE WHEN A = B THEN 1 ELSE NULL END. For details on how it really works I refer you to this Stack Overflow question:

Alternative

All that nice talk about clear logic and maintainability notwithstanding, you might still like to be able to implement the query as a single SELECT. Even though we have managed to simplify the initial version by reducing code repetition, the MONTH(date_start) expression still has to be specified twice across the query, because each SELECT leg needs it, so why not try eliminating that repetition too? And if that is not enough of a reason, the alternative solution might turn out faster, maybe even noticeably. And maybe the resulting query would not look too ugly. In the end, it would just be nice to have a choice, plain and simple.

So, how would it be possible to rewrite the query using WITH ROLLUP, so that both the client details and the rollup row would be produced by the same SELECT statement (without any UNION ALL kind of cheating)?

Well, you can use the former solution as a prototype. One part of that query performs group concatenation over clients. The other part counts rows over the entire set. Now, if you want to have a single-part query, the single part must do both operations at both levels.

Which kind of information to show at which level should then be determined by another set of conditionals.

Taking the above-mentioned points into consideration, here is my attempt at a single-step query:

SELECT
  client,
  IF(client IS NULL, COUNT(month =  1 OR NULL), GROUP_CONCAT(CASE month WHEN  1 THEN item END SEPARATOR ',')) AS jan,
  IF(client IS NULL, COUNT(month =  2 OR NULL), GROUP_CONCAT(CASE month WHEN  2 THEN item END SEPARATOR ',')) AS feb,
  IF(client IS NULL, COUNT(month =  3 OR NULL), GROUP_CONCAT(CASE month WHEN  3 THEN item END SEPARATOR ',')) AS mar,
  IF(client IS NULL, COUNT(month =  4 OR NULL), GROUP_CONCAT(CASE month WHEN  4 THEN item END SEPARATOR ',')) AS apr,
  IF(client IS NULL, COUNT(month =  5 OR NULL), GROUP_CONCAT(CASE month WHEN  5 THEN item END SEPARATOR ',')) AS may,
  IF(client IS NULL, COUNT(month =  6 OR NULL), GROUP_CONCAT(CASE month WHEN  6 THEN item END SEPARATOR ',')) AS jun,
  IF(client IS NULL, COUNT(month =  7 OR NULL), GROUP_CONCAT(CASE month WHEN  7 THEN item END SEPARATOR ',')) AS jul,
  IF(client IS NULL, COUNT(month =  8 OR NULL), GROUP_CONCAT(CASE month WHEN  8 THEN item END SEPARATOR ',')) AS aug,
  IF(client IS NULL, COUNT(month =  9 OR NULL), GROUP_CONCAT(CASE month WHEN  9 THEN item END SEPARATOR ',')) AS sep,
  IF(client IS NULL, COUNT(month = 10 OR NULL), GROUP_CONCAT(CASE month WHEN 10 THEN item END SEPARATOR ',')) AS oct,
  IF(client IS NULL, COUNT(month = 11 OR NULL), GROUP_CONCAT(CASE month WHEN 11 THEN item END SEPARATOR ',')) AS nov,
  IF(client IS NULL, COUNT(month = 12 OR NULL), GROUP_CONCAT(CASE month WHEN 12 THEN item END SEPARATOR ',')) AS `dec`
FROM
  (
    SELECT
      c.name AS client,
      MONTH(e.date_start) AS month,
      CONCAT(e.id, '-', RIGHT(DATE(e.date_start), 5)) AS item
    FROM
      event AS e
      INNER JOIN client AS c ON e.client_id = c.id
  ) AS derived
GROUP BY
  client
  WITH ROLLUP
;

As you can see, the query is calculating both the COUNTs and the GROUP_CONCATs both at the client level and at the whole set level. But each pair of the results is put inside an IF function, so that ultimately only one or the other result is returned in each column.

The condition to check is client IS NULL. If client happens to be null, that means that the current group represents the entire set and in that case each IF function chooses the COUNT result. When client value is not null, that means we are at the client level and each group of rows represents a specific client. In that case the GROUP_CONCAT result is returned, which is according to requirements, because for clients we must show concatenated strings.

Both solutions can be found at dbfiddle.uk.

Additional remarks

In my explanation above I tried to focus on the solutions and how they work. To avoid distractions, I allowed certain anti-patterns in my code that are worth mentioning.

  1. Reliance on precedence rules when implicitly converting data.

    The results of the functions COUNT() and GROUP_CONCAT() are different. One returns an integer, the other a string. When you are trying to put values of such different types into a single column, the server must decide which type to convert to which other type. It is good to know these rules but you should never rely on them in production code. That is just bad practice.

    In the queries above, a COUNT and a GROUP_CONCAT are either in the same column in different legs of the same query or in the same conditional that chooses between the two. In each case MySQL needs to apply its rules of type precedence. To avoid that, you can just explicitly cast each COUNT as a string:

    CAST(COUNT(...) AS char)
    
  2. Missing ORDER BY in GROUP_CONCAT.

    If you omit an ORDER BY, you are simply saying that you do not care if one time the query returns the string as A,B,C and the other as B,A,C and later as C,B,A. If you want your results to be predictable, always specify an ORDER BY and always use enough criteria to avoid ties.

    In the above queries the lines were simply very long already and I purposefully omitted the ORDER BYs for presentability's sake. The issue can be easily fixed with an ORDER BY like this:

    ORDER BY item ASC
    

    More specifically, in GROUP_CONCAT it would be used like this:

    GROUP_CONCAT(CASE month WHEN 12 THEN item END ORDER BY item ASC SEPARATOR ',')