PostgreSQL – How to Group by ID and Calculate Average Time Delta

aggregatepostgresqlpostgresql-14

I have a table with ids and timestamps:

postgres=> select id_user, date_seconds from orders.orders order by id_user limit 10;
           id_user            |    date_seconds
------------------------------+---------------------
 00D4jzHHesfCXGO6VQfUun9JcCW2 | 2021-09-20 02:00:00
 00D4jzHHesfCXGO6VQfUun9JcCW2 | 2021-10-06 02:00:00
 00D4jzHHesfCXGO6VQfUun9JcCW2 |
 00d9QGWjmbYdRTGw4h4GynhDp2K2 | 2021-10-01 02:00:00
 00OI3ruCKeOFdIro11eVNcfp84A2 | 2021-10-06 02:00:00
 00OI3ruCKeOFdIro11eVNcfp84A2 | 2021-08-24 02:00:00
 00qwpdw23BSEBGkfJhR4Hb7Kg1v2 | 2021-09-17 02:00:00
 00qwpdw23BSEBGkfJhR4Hb7Kg1v2 | 2021-09-20 02:00:00
 00qwpdw23BSEBGkfJhR4Hb7Kg1v2 | 2021-10-04 02:00:00
 00qwpdw23BSEBGkfJhR4Hb7Kg1v2 | 2021-08-20 02:00:00
(10 rows)

As you can see, one id_user can appear multiple times in this table.

I would like to group this table by id_user. However, I would also want to have the average delta time of the date_seconds column between the rows of that particular id_user.

I figured I might need to use the LAG psql function, but I am unsure how or if I`m even on the right track here.

Examples as suggested by the comments:

Suppose I have the following table:

INSERT INTO orders(id_user, ordered_at) VALUES
('1', '2021-09-20 02:00:00'),
('1', '2021-10-06 02:00:00'),
('2', '2021-10-01 02:00:00'),
('3', '2021-10-06 02:00:00'),
('3', '2021-08-24 03:00:00'),
('4', '2021-08-20 02:00:00'),
('4', '2021-09-17 02:00:00'),
('4', '2021-09-20 02:00:00'),
('4', '2021-10-04 02:00:00');

The desired result would be the following:
The avg time between a user's orders can be calculated as follows:
AVG( time interval between orders )

example: id_user 1: AVG( 2021-09-20 02:00:00 - 2021-10-06 02:00:00) which equals 16 days and 0 hours interval

example: id user 4: ACG( (2021-08-20 02:00:00 - 2021-09-17 02:00:00) , (2021-09-17 02:00:00 - 2021-09-20 02:00:00) , (2021-09-20 02:00:00 - 2021-10-04 02:00:00)  )
example: id user 4: AVG( (interval '28 days') , (interval '3 days') , (interval '14 days')  )
example: id user 4: 28+3+14 / 3 = 15 --> result should be interval '15 days'

postgres=> SELECT magic(ordered_at) as avg_timedelta, id_user from orders where id_user='4' group by id_user;
            id_user            |    avg_timedelta
------------------------------+---------------------
              4                |      15 days

Best Answer

Something like:

select id_user, (max(ordered_at)-min(ordered_at)) / (count(1)-1)
from orders
group by id_user
having count(1) > 1;

If you want to use lead (or lag) instead, you can determine the duration between two consecutive rows and then apply avg on that set:

with t as (
    select id_user, ordered_at
         , lead(ordered_at) over (partition by id_user order by ordered_at) as ld_ordered_at
    from orders
)
select id_user, avg(ld_ordered_at - ordered_at) 
from t
where ld_ordered_at is not null
group by id_user;

Fiddle

The idea behind the first solution is that:

(x2-x1)+(x3-x2)+...+(x_n+1-x_n) <=>
(-x1+(x2-x2)+(x3-x3)+...+(x_n-1-x_n-1)+x_n) <=>
(x_n-x1)

and then divide with n-1 to get the average

Related Solutions

PostgreSQL – SQL Hourly Data Aggregation

select
  date_trunc('hour', t - interval '1 minute') as interv_start,
  date_trunc('hour', t - interval '1 minute')  + interval '1 hours' as interv_end,
 sum(v)
  from myt 
    group by date_trunc('hour', t - interval '1 minute')
order by interv_start

see sqlfiddle

As for the index: you could try a function index on date_trunc('hour', t - interval '1 minute') but I'm not sure postgresql can use it.

PostgreSQL – Using Variable to Determine Column Name in SELECT

I think you might be looking for the USING clause:

EXECUTE format('SELECT COALESCE(%I,col,$1) FROM public.TableMix 
                WHERE name = $2', ruleName)
        USING (username, whateverTheSecondParameterIs);

You use $1, $2, etc for placement-parameters, passed by USING. You use %I for itentifiers expanded by format.

Update: If you want to coalesce the column name ruleName if it's null, perhaps you want something more like:

EXECUTE format('SELECT %I FROM public.TableMix 
                WHERE name = $1', coalesce(ruleName, 'col'))
        USING (username);

You need to think about order of evaluation. What's part of the generated SQL, and what's evaluated in order to create that SQL?

Best Answer

Related Solutions

PostgreSQL – SQL Hourly Data Aggregation

PostgreSQL – Using Variable to Determine Column Name in SELECT

Related Question