MySQL – Grouping by 2-Hour DateTime Range

datetimeMySQLselect

Well, I have a table with some columns that some of these are repeated, but the time is not exactly the same, I need to do a group by that has a range of 2 hours.

Notice that all the fields marked in black repeat, but the datetime field may differ by a few minutes, so I need to make a group by joining everyone within that two-hour interval.

enter image description here

It would be something like: SELECT * FROM table GROUP BY sinistro, prefixo, data AND horaIfull_oco range 2h up or down

He would join all those who had a maximum of two hours apart. I have already researched and found nothing, if anyone knows how to do.

Best Answer

Sorry Woton, but your answer has several issues.

First of all, your usage of GROUP BY is not right. With GROUP BY you collapse several rows of a group to one. To get a meaningful result, all columns of your SELECT clause must either be included in the GROUP BY clause, or an aggregate function like MIN() or AVG() or SUM() has to be applied on it. Otherwise a random row of this group is displayed. So a SELECT * FROM foo GROUP BY bar is as good as always wrong. To make sure, you're doing it right, you could execute SET GLOBAL sql_mode = CONCAT(@@global.sql_mode, ',ONLY_FULL_GROUP_BY'); (for further information you can read this part of the manual)

The second issue with your query is, that the HAVING clause is a filter on the result of GROUP BY. Filters on columns that you are grouping by would be in the WHERE clause. Filters in the HAVING clause include an aggregate function, like HAVING SUM(foo) > 5000.
When you don't use an aggregate function, again a random row is used. Your HAVING clause therefore actually filters nothing at all. x is always greater than x - 5 and x is always smaller than x + 5.

A filter that would work in your case (though I don't think this is what you want) would for example be HAVING TIMESTAMPDIFF(MINUTE, MIN(horaIfull_oco), MAX(horaIfull_oco)) <= 150. This isn't what you want, right?

If I understood you correctly, you want to build a group when consecutive rows are within a certain timerange, right? For this we would have to generate a column first, which we can later use to group on.

Let's do that.

Here's your query, explanation follows:

SELECT sinistro, prefixo, data
/*whatever else you need*/
, MIN(horaIfull_oco), MAX(horaIfull_oco), MIN(id_oco) AS id_oco, GROUP_CONCAT(inclusao_oco) AS incl 
FROM (
    SELECT 
    t.* 
    , @group_number := IF(
        @prev_sinistro != sinistro 
        OR @prev_prefixo != prefixo 
        OR @prev_data != data 
        OR TIMESTAMPDIFF(MINUTE, @prev_horaIfull_oco, horaIfull_oco) >= 150
        , @group_number + 1 
        , @group_number) AS gn 
    , @prev_sinistro := sinistro
    , @prev_prefixo := prefixo
    , @prev_data := data 
    , @prev_horaIfull_oco := horaIfull_oco
    FROM your_table t 
    , (SELECT @prev_sinistro := NULL, @prev_prefixo := NULL, @prev_data := NULL, @prev_horaIfull_oco := NULL, @group_number := 0) v 
    ORDER BY sinistro, prefixo, data, horaIfull_oco
) sq 
GROUP BY sinistro, prefixo, data, gn 

First thing to notice, in a relational database there is no order of rows unless you specify it. So we ORDER BY the columns we want to use in our GROUP BY in the outer query and then by the timestamp/datetime column. The goal is, that we can use variables to access the previous row.
We don't want to execute the queries SET @prev_sinistro := NULL; SET @prev_prefixo := ... everytime before we execute the main query, so we initialize the variables in a subquery aliased with v here.
Then, every time sinistro or prefixo or data is different from the previous row, or when the difference between the previous rows horaIfull_oco and the value for this column in the current row is greater than 150 minutes, we increment the @group_number variable, if not we don't (we just assign the current value of the variable to the variable).
With @prev_prefixo := sinistro and so on we assign the value of the current row to the variable. It's important, that this is done after the IF() function. The SELECT clause is processed sequentially. When the next row is processed, the variables still hold the value of the previous row.

In the outer query finally, we can group by the generated gn column.

EDIT: To meet your further requirements, you could do this:

SELECT sinistro, prefixo, data, MIN(horaIfull_oco), MAX(horaIfull_oco)
FROM (
    SELECT 
    sinistro, prefixo, data, horaIfull_oco 
    , @group_number := IF(
        @prev_sinistro != sinistro 
        OR @prev_prefixo != prefixo 
        OR @prev_data != `data`
        OR TIMESTAMPDIFF(MINUTE, @prev_horaIfull_oco, horaIfull_oco) >= 150 
        OR DATE(@prev_horaIfull_oco) != DATE(horaIfull_oco)
        , @group_number + 1
        , @group_number) AS gn 
    , @prev_sinistro := sinistro
    , @prev_prefixo := prefixo
    , @prev_data := `data`
    , @prev_horaIfull_oco := horaIfull_oco
    FROM ocorrencia t 
    , (SELECT @prev_sinistro := NULL, @prev_prefixo := NULL, @prev_data := NULL, @prev_horaIfull_oco := NULL, @group_number := 0) v 
    ORDER BY sinistro, prefixo, data, horaIfull_oco
) sq 
GROUP BY sinistro, prefixo, data, gn;

EDIT 2:

So, I excluded data and used this query:

SELECT sinistro, prefixo -- , data
/*whatever else you need*/
, MIN(horaIfull_oco), MAX(horaIfull_oco), GROUP_CONCAT(id_oco) AS id_oco
FROM (
    SELECT 
    id_oco,
    sinistro, prefixo, /*data,*/ horaIfull_oco
    , @group_number := IF(
        @prev_sinistro != sinistro 
        OR @prev_prefixo != prefixo 
        -- OR @prev_data != data 
        OR TIMESTAMPDIFF(MINUTE, @prev_horaIfull_oco, horaIfull_oco) >= 150
        OR DATE(@prev_horaIfull_oco) != DATE(horaIfull_oco)
        , @group_number + 1 
        , @group_number) AS gn 
    , @prev_sinistro := sinistro
    , @prev_prefixo := prefixo
    -- , @prev_data := data 
    , @prev_horaIfull_oco := horaIfull_oco
    FROM ocorrencia t 
    , (SELECT @prev_sinistro := NULL, @prev_prefixo := NULL, @prev_data := NULL, @prev_horaIfull_oco := NULL, @group_number := 0) v 
    -- having id_oco in (254, 277)
    ORDER BY sinistro, prefixo, /*data,*/ horaIfull_oco
) sq 
GROUP BY sinistro, prefixo, /*data,*/ gn

Regarding your problems with id_oco 254 and 277, when you include

OR DATE(@prev_horaIfull_oco) != DATE(horaIfull_oco)

in the IF() function, these 2 records are in separate groups, if you don't they are in the same group because they are within the 2.5 hours range.

If you still have problems, please provide the desired result based on your sample data as well, to avoid further confusion.