Mysql – Filter and Aggregate a column not in the group by clause MySQL

group byMySQLmysql-5.7

I have the following table. I'm using mysql-5.7. Here is the dbfiddle

select * from report;
+----+---------------------+---------+------+------+------------+------+-------------------+
| id | date_hour           | speed   | name | nor  | validation | p_i  | speed_performance |
+----+---------------------+---------+------+------+------------+------+-------------------+
|  1 | 2018-02-03 13:23:00 |   -12.3 | SYN  |   10 | 1          | 3    | 4                 |
|  2 | 2018-02-03 11:23:00 |   -6.36 | SYN  |   13 | 0          | 4    | 5                 |
|  3 | 2018-02-03 01:23:00 |  -26.36 | SYN  |   24 | 0          | 2    | 4                 |
|  4 | 2020-04-06 21:23:00 | -156.36 | SYN  |   16 | 1          | 3    | 6                 |
|  5 | 2020-04-06 03:23:00 |  -36.36 | YRT  |  136 | 0          | 2    | 5                 |
|  6 | 2020-04-06 12:23:00 |  -16.36 | SYN  |   13 | 1          | 4    | 4                 |
+----+---------------------+---------+------+------+------------+------+-------------------+

I would like to group by date_hour such that after 12:00:00, it counts as the next day. The p_i column should be averaged based on this condition: take average of p_i where validation is 1 and p_i is non-zero. The values should be averaged based on the new version of date_hour. Additionally, another column (nor) is averaged without such a condition.

The output should look like this

+------+------------+----------+------+
| name | date_hr    | avg(nor) | p_i  |
+------+------------+----------+------+
| SYN  | 2018-02-03 |     18.5 | NULL |
| SYN  | 2018-02-04 |       10 |    3 |
| SYN  | 2020-04-07 |     14.5 |  3.5 |
| YRT  | 2020-04-06 |      136 | NULL |
+------+------------+----------+------+

I tried the following query. I had to disable only_full_group_by to get it to work:

select name, 
case when time(date_hour) > '12:00:00' 
then date(date_hour) + interval 1 day 
else date(date_hour) end as date_hr, 
avg(nor), 
case when validation = 1 and p_i > 0 then avg(p_i) else null end as p_i 
from report group by name, date_hr;

+------+------------+----------+------+
| name | date_hr    | avg(nor) | p_i  |
+------+------------+----------+------+
| SYN  | 2018-02-03 |     18.5 |    3 |
| SYN  | 2018-02-04 |       10 | NULL |
| SYN  | 2020-04-07 |     14.5 | NULL |
| YRT  | 2020-04-06 |      136 |    2 |
+------+------------+----------+------+

Best Answer

Seems easiest to just add 12 hours to date_hour:

select name, DATE(DATE_ADD(date_hour,INTERVAL 12 hour)) as date_hr
     , avg(nor)
     , avg( case when not (validation = 1 and p_i > 0) then p_i end) as p_i
from report
group by name, DATE(DATE_ADD(date_hour,INTERVAL 12 hour))
order by name, DATE(DATE_ADD(date_hour,INTERVAL 12 hour));

Fiddle

EDIT: Doing some experiments made me realize what might cause some confusion. Look at your validation column:

`validation` enum('0','1') DEFAULT NULL,

This means that:

COUNT(1) WHERE validation = 0  => 0
COUNT(1) WHERE validation = 1  => 3
COUNT(1) WHERE validation = 2  => 3

Whereas

COUNT(1) WHERE validation = '0'  => 3
COUNT(1) WHERE validation = '1'  => 3
COUNT(1) WHERE validation = '2'  => 0

My rephrased query would be:

select name, DATE(DATE_ADD(date_hour,INTERVAL 12 hour)) as date_hr
     , avg(nor)
     , avg( case when (validation = '1' and p_i <> '0') then p_i end) as p_i
from report
group by name, DATE(DATE_ADD(date_hour,INTERVAL 12 hour))
order by name, DATE(DATE_ADD(date_hour,INTERVAL 12 hour));

Beware that avg of a string may give weird results. Finally, ENUM in MySQL is a very weird construction that I would try to avoid. You may be interested in the following observation:

How does MariaDB handle ENUM types used in FOREIGN KEY constraints?

Updated Fiddle