With the comment from FrustratedWithFormsDesigner, I came to the following solution:
SELECT subq2.*, sum(new_group) OVER (ORDER BY t ASC) AS group_id
FROM (
SELECT subq.*, CASE WHEN delta > 1500 THEN 1 ELSE 0 END AS new_group
FROM (
SELECT t, lag(t) over (ORDER BY t ASC),
t - lag(t) over (ORDER BY t ASC) AS delta
FROM time_points
) AS subq
) AS subq2
I am using the sum of all new_group values to create different groups.
Thank you very much!
I like tsrange. It's certainly not the only way to do this, but it is not error prone and it's easy to read and write.
SELECT
grp.range,
sum(
EXTRACT(epoch FROM least(upper(grp.range),datahora_fim))
- EXTRACT(epoch FROM greatest(lower(grp.range),datahora_ini))
)
FROM (
SELECT
date_trunc('hour', min(datahora_ini)),
date_trunc('hour', max(datahora_fim))
FROM login
) AS bounds(min,max)
CROSS JOIN LATERAL generate_series(min, max, '1 hour') AS gs(start)
CROSS JOIN LATERAL tsrange(gs.start, gs.start + '1 hour') AS grp(range)
JOIN login ON grp.range && tsrange(datahora_ini,datahora_fim)
GROUP BY range
ORDER BY range;
range | sum
-----------------------------------------------+-------
["2017-06-02 08:00:00","2017-06-02 09:00:00") | 17821
["2017-06-02 09:00:00","2017-06-02 10:00:00") | 18000
["2017-06-02 10:00:00","2017-06-02 11:00:00") | 18000
["2017-06-02 11:00:00","2017-06-02 12:00:00") | 17079
["2017-06-02 12:00:00","2017-06-02 13:00:00") | 14400
["2017-06-02 13:00:00","2017-06-02 14:00:00") | 14363
["2017-06-02 14:00:00","2017-06-02 15:00:00") | 3716
["2017-06-02 15:00:00","2017-06-02 16:00:00") | 3600
["2017-06-02 16:00:00","2017-06-02 17:00:00") | 833
(9 rows)
The first part generates the ranges for the data
SELECT bounds.*, grp.*
FROM (
SELECT
date_trunc('hour', min(datahora_ini)),
date_trunc('hour', max(datahora_fim))
FROM login
) AS bounds(min,max)
CROSS JOIN LATERAL generate_series(min, max, '1 hour') AS gs(start)
CROSS JOIN LATERAL tsrange(gs.start, gs.start + '1 hour') AS grp(range)
ORDER BY range;
min | max | range
---------------------+---------------------+-----------------------------------------------
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 08:00:00","2017-06-02 09:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 09:00:00","2017-06-02 10:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 10:00:00","2017-06-02 11:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 11:00:00","2017-06-02 12:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 12:00:00","2017-06-02 13:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 13:00:00","2017-06-02 14:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 14:00:00","2017-06-02 15:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 15:00:00","2017-06-02 16:00:00")
2017-06-02 08:00:00 | 2017-06-02 17:00:00 | ["2017-06-02 16:00:00","2017-06-02 17:00:00")
(9 rows)
The second part joins them back to the original data, and
- Pulls the seconds since epoch on the least of the high-point on the range and the
datahora_fim
. The ceiling for the range is the range itself.
- Pulls the seconds since epoch on the greatest of the low point on the range, and
datahora_ini
. The floor for the range is the range itself.
- Subtracts the two to get the difference in seconds
- Sums it up.
That looks like this,
sum(
EXTRACT(epoch FROM least(upper(grp.range),datahora_fim))
- EXTRACT(epoch FROM greatest(lower(grp.range),datahora_ini))
)
This method can use a functional index on tsrange(datahora_ini,datahora_fim)
.
Your original query just shows the lower part of the tsrange
, if you prefer that just use lower(grp.range)
Best Answer
There's a number of different ways to do this, but if you only need it to be a monthly period starting on a given date, I would use a variable for the day of the month to start on and just calculate a new column with that month. Something like:
Apologies for this being T-SQL syntax (I'm not a PL/SQL guy), but in case you need help on declaring variables: http://www.postgresqltutorial.com/plpgsql-variables/