PostgreSQL – Get Value Increment During an Hour of a Day

aggregatedatetimeintervalpostgresql

I query YouTube Data Api for a list of most popular videos on a channel and then get their statistics, 4 times per hour (each 15 minutes, by cron). The data is stored in Postgres, but dumping it and loading into another SQL DB wouldn't be a trouble.
Now I have following table of data:

 video_id| views_count | likes_count | timestamp 
---------+-------------+-------------+---------------------
     foo | 100         | 1           | 2018-12-01 12:01:03
     foo | 101         | 1           | 2018-12-01 12:16:06
     foo | 105         | 1           | 2018-12-01 12:31:01
     bar | 199         | 0           | 2018-12-01 12:01:02
     bar | 200         | 0           | 2018-12-01 12:16:08
     bar | 301         | 5           | 2018-12-01 12:31:02
     ... | ...

UPD: Here's the schema (pasted to sqlfiddle):

CREATE TABLE video_statistics
(
  video_id TEXT not null,
  views_count INTEGER not null,
  likes_count INTEGER not null,
  timestamp TIMESTAMPTZ not null
);

How should I query that data in order to get increments by hour in view_counts and likes_count columns, grouped by video?
To clarify what I want to get:

hour_of_day|video_id|views_increment|likes_increment
-----------+--------+---------------+---------------
     ...   | ...
     11    | foo    | 4             | 0
     12    | foo    | 5             | 1
     ...   | ...
     11    | bar    | 73            | 0
     12    | bar    | 102           | 5
     ...   | ...

In other words, it's a "best time to post video" based on historical data, taking into account data during many weeks and months.
Should I rather dump the data into some timeseries DB or other, more appropriate for such cases DB, and query it there? Or should I just resort to calculating this in code?

Best Answer

One possibility is to first row_number() the records to get the first and last value per video, day and hour. Then join the two sets of first and last values to get the respective differences. Group the result on video and hour and get the sum or the average per video per day.

SELECT first.video_id,
       first.timestamp_hour,
       sum(last.views_count - first.views_count) views_count_diff_sum,
       sum(last.likes_count - first.likes_count) likes_count_diff_sum,
       avg(last.views_count - first.views_count) views_count_diff_avg,
       avg(last.likes_count - first.likes_count) likes_count_diff_avg
       FROM (SELECT video_id,
             timestamp_day,
             timestamp_hour,
             views_count,
             likes_count
             FROM (SELECT video_id,
                          timestamp::date timestamp_day,
                          date_part('hour', timestamp) timestamp_hour,
                          views_count,
                          likes_count,
                          row_number() OVER (PARTITION BY video_id,
                                                          timestamp::date,
                                                          date_part('hour', timestamp)
                                             ORDER BY timestamp ASC) rn
                          FROM elbat) first
             WHERE rn = 1) first
            INNER JOIN (SELECT video_id,
                               timestamp_day,
                               timestamp_hour,
                               views_count,
                               likes_count
                               FROM (SELECT video_id,
                                            timestamp::date timestamp_day,
                                            date_part('hour', timestamp) timestamp_hour,
                                            views_count,
                                            likes_count,
                                            row_number() OVER (PARTITION BY video_id,
                                                                            timestamp::date,
                                                                            date_part('hour', timestamp)
                                                               ORDER BY timestamp DESC) rn
                                            FROM elbat) last
                               WHERE rn = 1) last
                       ON last.video_id = first.video_id
                          AND last.timestamp_day = first.timestamp_day
                          AND last.timestamp_hour = first.timestamp_hour
       GROUP BY first.video_id,
                first.timestamp_hour;

Related Solutions

PostgreSQL – Get Seconds of Day

Use the extract() method:

select extract(second from current_timestamp) +
       extract(minute from current_timestamp) * 60 +
       extract(hour from current_timestamp) * 60 * 60;

of course this can be put into a function:

create or replace function total_seconds(p_timestamp timestamp)
 returns int
as
$$
  select (extract(second from p_timestamp) +
         extract(minute from p_timestamp) * 60 +
         extract(hour from p_timestamp) * 60 * 60)::int;
$$
language sql;

more details are in the manual:
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

PostgreSQL Aggregate – How to Get First Value from List

You can use the DISTINCT ON of SELECT as in the documentation

SELECT DISTINCT ON (id)
  id, col1, col2
FROM TABLE
ORDER BY id, col1, col2

The difference between SELECT DISTINCT <columns> and SELECT DISTINCT ON (<columns>) <columns> is that the first gives you unique rows across all selected columns .. the second gives you one unique row per column set defined within the parenthesis, but allows for additional columns to be returned.

To specify WHICH additional data is returned, you need to ORDER BY.. DISTINCT will then return the first row for each DISTINCT ON set of columns.

Best Answer

Related Solutions

PostgreSQL – Get Seconds of Day

PostgreSQL Aggregate – How to Get First Value from List

Related Question