PostgreSQL – Filter created_at and Count Distinct Minutes

aggregatepostgresqltimestamp

I have this query to get records from a database:

SELECT camera_id, count(*) AS snapshot_count
FROM   snapshots
WHERE  created_at >= timestamp 'yesterday'
AND    created_at <  timestamp 'today'
GROUP  BY camera_id;

I want to add some filters on created_at to get the count for matching rows only. E.g., the user will provide camera_id , date, time, days, for example:

From date: 2015/01/01 
to date: 2015/12/30
Schedule: Monday-Friday
Timings: 9 AM to 5 pm UTC

How to apply all those to created_at to get valid records?

Update:

I also want a count per minute. If 1 minute contains more than 1 image it shall still be counted as one. You could say, only count the first snapshot per minute.

Shortly if it is 1 per 60 minutes then it will be 1 count for 60 minutes between 9 AM to 10AM and so as on to 5PM.

Best Answer

For lack of information, assuming created_at is data type timestamp, saving UTC timestamps:

SELECT camera_id, count(*) AS snapshot_count
FROM   snapshots
WHERE  created_at >= timestamp '2015-01-01'   -- From date: 2015/01/01 
AND    created_at <  timestamp '2015-12-31'   -- to date: 2015/12/30 
                                              -- chop off '2015-12-31'?
AND    created_at::time BETWEEN time '09:00' AND time '17:00'
                                              -- 9 AM to 5 AM UTC
                                              -- assuming you meant 5 PM
AND    EXTRACT('ISODOW' FROM created_at) < 6  -- Monday-Friday
GROUP  BY camera_id;
  • The check for date seems trivial - but be careful what to include / exclude exactly with upper lower bound.

  • For the time component cast to time: created_at::time.

  • For the day of week use EXTRACT() with the pattern ISODOW (simpler for your kind of pattern than DOW).

    In case you are not aware of the difference between the data types timestamp and timestamptz and the role of the timezone settings and DST, now is the time to read up:

  • Ignoring timezones altogether in Rails and PostgreSQL

Count minutes with multiple rows as one

For your added Update:

SELECT camera_id
     , count(DISTINCT date_trunc('minute', created_at) AS snapshot_minute_count
FROM ...

Effectively, minutes are counted in which at least one (filtered) snapshot was taken. If 7 snapshots were taken in the same minute, they still count as one.

Performance

The filters on time and day of week are not sargable. If your table is big and performance important, create functional indexes to change that - using IMMUTABLE expressions. Depending on your actual data type, you need to create an IMMUTABLE wrapper function that works independently of the timezone setting, because neither the cast to time nor EXTRACT() (internally the function date_part()) are defined IMMUTABLE for various reasons.

CREATE FUNCTION f_to_time(timestamp)
  RETURNS time LANGUAGE sql IMMUTABLE COST 5 AS
'SELECT $1::time';

CREATE OR REPLACE FUNCTION f_to_isodow(timestamp)
  RETURNS int LANGUAGE sql IMMUTABLE COST 20 AS
$$SELECT EXTRACT('ISODOW' FROM $1 AT TIME ZONE 'UTC')::int$$;

Then:

CREATE INDEX snapshots_created_at_time_idx ON snapshots (f_time(created_at));
CREATE INDEX snapshots_created_at_isodow_idx ON snapshots (f_to_isodow(created_at));

To utilize these indexes, the query hast to use the same expressions:

SELECT camera_id
     , count(*) AS snapshot_count
     , count(DISTINCT date_trunc('minute', created_at) AS snapshot_minute_count
FROM   snapshots
WHERE  created_at >= timestamp '2015-01-01'
AND    created_at <  timestamp '2015-12-31'
AND    f_time(created_at) BETWEEN time '09:00' AND time '17:00'
AND    f_to_isodow(created_at) < 6
GROUP  BY camera_id;