MySQL – Grouping by 2-Hour DateTime Range

datetimeMySQLselect

Well, I have a table with some columns that some of these are repeated, but the time is not exactly the same, I need to do a group by that has a range of 2 hours.

Notice that all the fields marked in black repeat, but the datetime field may differ by a few minutes, so I need to make a group by joining everyone within that two-hour interval.

It would be something like: SELECT * FROM table GROUP BY sinistro, prefixo, data AND horaIfull_oco range 2h up or down

He would join all those who had a maximum of two hours apart. I have already researched and found nothing, if anyone knows how to do.

Best Answer

Sorry Woton, but your answer has several issues.

First of all, your usage of GROUP BY is not right. With GROUP BY you collapse several rows of a group to one. To get a meaningful result, all columns of your SELECT clause must either be included in the GROUP BY clause, or an aggregate function like MIN() or AVG() or SUM() has to be applied on it. Otherwise a random row of this group is displayed. So a SELECT * FROM foo GROUP BY bar is as good as always wrong. To make sure, you're doing it right, you could execute SET GLOBAL sql_mode = CONCAT(@@global.sql_mode, ',ONLY_FULL_GROUP_BY'); (for further information you can read this part of the manual)

The second issue with your query is, that the HAVING clause is a filter on the result of GROUP BY. Filters on columns that you are grouping by would be in the WHERE clause. Filters in the HAVING clause include an aggregate function, like HAVING SUM(foo) > 5000.
When you don't use an aggregate function, again a random row is used. Your HAVING clause therefore actually filters nothing at all. x is always greater than x - 5 and x is always smaller than x + 5.

A filter that would work in your case (though I don't think this is what you want) would for example be HAVING TIMESTAMPDIFF(MINUTE, MIN(horaIfull_oco), MAX(horaIfull_oco)) <= 150. This isn't what you want, right?

If I understood you correctly, you want to build a group when consecutive rows are within a certain timerange, right? For this we would have to generate a column first, which we can later use to group on.

Let's do that.

Here's your query, explanation follows:

SELECT sinistro, prefixo, data
/*whatever else you need*/
, MIN(horaIfull_oco), MAX(horaIfull_oco), MIN(id_oco) AS id_oco, GROUP_CONCAT(inclusao_oco) AS incl 
FROM (
    SELECT 
    t.* 
    , @group_number := IF(
        @prev_sinistro != sinistro 
        OR @prev_prefixo != prefixo 
        OR @prev_data != data 
        OR TIMESTAMPDIFF(MINUTE, @prev_horaIfull_oco, horaIfull_oco) >= 150
        , @group_number + 1 
        , @group_number) AS gn 
    , @prev_sinistro := sinistro
    , @prev_prefixo := prefixo
    , @prev_data := data 
    , @prev_horaIfull_oco := horaIfull_oco
    FROM your_table t 
    , (SELECT @prev_sinistro := NULL, @prev_prefixo := NULL, @prev_data := NULL, @prev_horaIfull_oco := NULL, @group_number := 0) v 
    ORDER BY sinistro, prefixo, data, horaIfull_oco
) sq 
GROUP BY sinistro, prefixo, data, gn

First thing to notice, in a relational database there is no order of rows unless you specify it. So we ORDER BY the columns we want to use in our GROUP BY in the outer query and then by the timestamp/datetime column. The goal is, that we can use variables to access the previous row.
We don't want to execute the queries SET @prev_sinistro := NULL; SET @prev_prefixo := ... everytime before we execute the main query, so we initialize the variables in a subquery aliased with v here.
Then, every time sinistro or prefixo or data is different from the previous row, or when the difference between the previous rows horaIfull_oco and the value for this column in the current row is greater than 150 minutes, we increment the @group_number variable, if not we don't (we just assign the current value of the variable to the variable).
With @prev_prefixo := sinistro and so on we assign the value of the current row to the variable. It's important, that this is done after the IF() function. The SELECT clause is processed sequentially. When the next row is processed, the variables still hold the value of the previous row.

In the outer query finally, we can group by the generated gn column.

EDIT: To meet your further requirements, you could do this:

SELECT sinistro, prefixo, data, MIN(horaIfull_oco), MAX(horaIfull_oco)
FROM (
    SELECT 
    sinistro, prefixo, data, horaIfull_oco 
    , @group_number := IF(
        @prev_sinistro != sinistro 
        OR @prev_prefixo != prefixo 
        OR @prev_data != `data`
        OR TIMESTAMPDIFF(MINUTE, @prev_horaIfull_oco, horaIfull_oco) >= 150 
        OR DATE(@prev_horaIfull_oco) != DATE(horaIfull_oco)
        , @group_number + 1
        , @group_number) AS gn 
    , @prev_sinistro := sinistro
    , @prev_prefixo := prefixo
    , @prev_data := `data`
    , @prev_horaIfull_oco := horaIfull_oco
    FROM ocorrencia t 
    , (SELECT @prev_sinistro := NULL, @prev_prefixo := NULL, @prev_data := NULL, @prev_horaIfull_oco := NULL, @group_number := 0) v 
    ORDER BY sinistro, prefixo, data, horaIfull_oco
) sq 
GROUP BY sinistro, prefixo, data, gn;

EDIT 2:

So, I excluded data and used this query:

SELECT sinistro, prefixo -- , data
/*whatever else you need*/
, MIN(horaIfull_oco), MAX(horaIfull_oco), GROUP_CONCAT(id_oco) AS id_oco
FROM (
    SELECT 
    id_oco,
    sinistro, prefixo, /*data,*/ horaIfull_oco
    , @group_number := IF(
        @prev_sinistro != sinistro 
        OR @prev_prefixo != prefixo 
        -- OR @prev_data != data 
        OR TIMESTAMPDIFF(MINUTE, @prev_horaIfull_oco, horaIfull_oco) >= 150
        OR DATE(@prev_horaIfull_oco) != DATE(horaIfull_oco)
        , @group_number + 1 
        , @group_number) AS gn 
    , @prev_sinistro := sinistro
    , @prev_prefixo := prefixo
    -- , @prev_data := data 
    , @prev_horaIfull_oco := horaIfull_oco
    FROM ocorrencia t 
    , (SELECT @prev_sinistro := NULL, @prev_prefixo := NULL, @prev_data := NULL, @prev_horaIfull_oco := NULL, @group_number := 0) v 
    -- having id_oco in (254, 277)
    ORDER BY sinistro, prefixo, /*data,*/ horaIfull_oco
) sq 
GROUP BY sinistro, prefixo, /*data,*/ gn

Regarding your problems with id_oco 254 and 277, when you include

OR DATE(@prev_horaIfull_oco) != DATE(horaIfull_oco)

in the IF() function, these 2 records are in separate groups, if you don't they are in the same group because they are within the 2.5 hours range.

If you still have problems, please provide the desired result based on your sample data as well, to avoid further confusion.

Related Solutions

Mysql – How to compare (in MySQL) a DATETIME value to a TIMESTAMP value

There is a manual page on Time and Date functions.

The best way to compare date and time and get minutes back is probably to use one of the following:

SELECT TIME_TO_SEC(TIMEDIFF(timea,timeb)+0)/60
SELECT TIMESTAMPDIFF(MINUTE,timeb,timea)

Note that while TIMEDIFF works with times, TIMESTAMPDIFF requires a date component - and the values are switched around for TIMESTAMPDIFF.

You can return hours with either of these:

SELECT TIME_TO_SEC(TIMEDIFF(timea,timeb)+0)/60/60
SELECT TIMESTAMPDIFF(HOUR,timeb,timea)

Note that when using DATEDIFF, the value returned is a difference between the date components of the given time - thus, a value of 23:59 on one day compared to a value of 0:01 on the next (a 2 minute difference) results in a date difference of 1 (a full day).

To get a difference between two times in 24-hour periods (length of a day) first calculate the difference in minutes then divide by the proper values to find the number of 24-hour periods. Try this:

SELECT TRUNCATE(TIMESTAMPDIFF(MINUTE,timeb,timea)/60.0/24.0,0)

This truncates the decimal portion, returning only the number of complete 24-hour periods in the specified timespan.

MySQL: Why is DATEDIFF and TIMEDIFF/60.0/60.0/24.0 results different

The answer lies in fully understanding the TIMEDIFF function and what it returns as an integer.

First, let's consider TIMEDIFF in its native element and what it returns:

mysql> select timediff('0:0:0','1:0:0') as timediff;
+-----------+
| timediff  |
+-----------+
| -01:00:00 | 
+-----------+
1 row in set (0.00 sec)

Second, what does TIMEDIFF return in an integer context - done by adding zero:

mysql> select timediff('0:0:0','1:0:0')+0 as timediff2;
+--------------+
| timediff2    |
+--------------+
| 10000.000000 | 
+--------------+
1 row in set (0.01 sec)

Now, we use the TIME_TO_SEC function to return what we actually want:

mysql> select time_to_sec(timediff('0:0:0','1:0:0')+0)/60 as timediff3;
+-----------+
| timediff3 |
+-----------+
|   60.0000 | 
+-----------+
1 row in set (0.00 sec)

The integer value returned by TIMEDIFF is not minutes or seconds, but rather a base-10 representation of the actual time - so 01:00:00 becomes 10000, and 05:05:01 becomes 50501.

Instead of using TIME_TO_SEC one can also use TIMESTAMPDIFF instead of TIMEDIFF and set the units to whatever is desired:

mysql> select timestampdiff(MINUTE,'2012-04-13 0:0:0','2012-04-13 1:0:0') as timediff4;
+-----------+
| timediff4 |
+-----------+
|        60 | 
+-----------+
1 row in set (0.00 sec)

Instead of MINUTE, you can also use FRAC_SECOND (microseconds), SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR.

The time functions in MySQL (in this case, version 5.1) are all described in the manual. The online manual also links to other versions (such as MySQL 5.5) as well.

Best Answer

Related Solutions

Mysql – How to compare (in MySQL) a DATETIME value to a TIMESTAMP value

MySQL: Why is DATEDIFF and TIMEDIFF/60.0/60.0/24.0 results different

Related Question