PostgreSQL – Counting 3 Consecutive Unsynced Dates

gaps-and-islandspostgresqlwindow functions

I have a table set out like so:

user_id  the_date    minutes_asleep         
----------------------------
1       2015-01-01   480
1       2015-01-02   0
1       2015-01-03   0
1       2015-01-04   0
1       2015-01-05   321
1       2015-01-06   0
1       2015-01-07   0
1       2015-01-08   0
2       2015-01-01   567
2       2015-01-02   0
2       2015-01-03   285
2       2015-01-04   0
2       2015-01-05   577
2       2015-01-06   0
2       2015-01-07   0
2       2015-01-08   0

I need to find the count of groups of 3 consecutive dates where minutes asleep is 0. So in the example above user_id 1 would have a count of 2, user_id 2 would have a count of 1. The total count would be 3. So the returning result would be:

total_count
-----------
3

In the case that there are 5 consecutive dates, the count should only be 1. If there are 6 consecutive, the count should be 2 etc.

A user will only ever have one record per date.

Test table:

CREATE TABLE fitness_data (
    id serial NOT NULL,
    the_date date,
    minutes_asleep integer,
    user_id integer
);

INSERT INTO fitness_data (the_date, minutes_asleep,user_id) VALUES
('2015-01-01',480,1),('2015-01-02',0,1),('2015-01-03',0,1),('2015-01-04',0,1),('2015-01-05',321,1),('2015-01-06',0,1),('2015-01-07',0,1),('2015-01-08',0,1),('2015-01-01',567,2),('2015-01-02',0,2),('2015-01-03',285,2),('2015-01-04',0,2),('2015-01-05',577,2),('2015-01-06',0,2),('2015-01-07',0,2),('2015-01-08',0,2)

Best Answer

This counts ranges of at least 3 days with minutes_asleep = 0.
A continuous range of 5 days still counts as 1.
A continuous range of 6 days counts as 2. Etc.
Ignoring all other entries where minutes_asleep is different.

SELECT sum(ct) AS total_count
FROM  (
   SELECT (count(*)/3)::int AS ct  -- integer division truncates as desired + see below
   FROM (
      SELECT user_id
           , the_date - (row_number() OVER (PARTITION BY user_id
                                            ORDER BY the_date))::int AS grp
      FROM   fitness_data
      WHERE  minutes_asleep = 0
      ) sub1
   GROUP  BY user_id, grp
   HAVING count(*) >= 3
   ) sub2
UNION ALL  -- see below
SELECT 0
LIMIT 1;

UNION ALL ... is just to return 0 instead of "no row" where no qualifying ranges are found. You could also use COALESCE in another subselect, but I find this more elegant.

Why (count(*)/3)::int? count() returns bigint. sum(bigint) returns numeric. I am pretty positive we don't need to go that far. Casting back to int, so we get a bigint result. Should be faster, too.

Related answer with more explanation: