PostgreSQL Aggregate – Summarizing Data into Count Per Hour and Average Per Day

aggregatepostgresql

I need to summarise a dataset to show the hourly totals per day. The dataset contains travel card transactions for a period of 4 weeks. I would like to see the total transactions per hour (00:00-01:00, 01:00-02:00,…23:00-00:00) for each of the 28 days.

If this is achievable I would hope to resample this to show the average transactions per hour for weekday and weekend, respectively.

The datetime field is datatype timestamp without timezone i.e. 'YYYY-MM-DD HH:MM:SS'.

Is this something PostgreSQL can accomplish?

Best Answer

date_tunc() is the key.

Transactions per hour

SELECT date_trunc('hour', datetime) AS hour
     , count(*) AS transactons
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Average transactions per hour

... for weekday and weekend, respectively.

SELECT extract('ISODOW' from hour)::int/6 AS weekday_weekend
     , round(avg(transactions), 2) AS avg_transactions
FROM  (
   SELECT date_trunc('hour', datetime) AS hour
        , count(*) AS transactions
   FROM   tbl
   GROUP  BY 1
   ) sub
GROUP  BY 1;

0 .. weekday
1 .. weekend

ISODOW produces 1 - 7 for Mon - Sun. Sat & Sun are "weekend". Integer division x/6 returns 0 for weekdays and 1 for weekend.

Include hours without activity

Above queries only observe hours with actual transactions. To include hours without transactions (no rows int the table) with a value of 0 transactions, (outer) join to a an hourly raster / grid for all of the given period of 28 days. And use COALESCE(). Like:

SELECT extract('ISODOW' from hour)::int/6 AS weekday_weekend
     , round(avg(COALESCE(transactions, 0)), 2) AS avg_transactions
FROM  (
   SELECT generate_series(timestamp '2020-12-01' -- first day
                        , timestamp '2020-12-28' -- last day
                          + interval '1 day - 1 hour'
                        , interval '1 hour')
   ) grid(hour)
LEFT JOIN (
   SELECT date_trunc('hour', datetime) AS hour
        , count(*) AS transactions
   FROM   tbl
   GROUP  BY 1
   ) sub USING (hour)
GROUP  BY 1;

The expression timestamp '2020-12-28' + interval '1 day - 1 hour' produces the latest hour of interest for the time series.

db<>fiddle here

Further reading: