Postgresql – Using only vanilla PostgreSQL, how to predict a future number based on existing data

postgresql

I have a table called "events", with a timestamp column.

Based on that timestamp column's current records, I want to determine when the next record in the future is likely to occur.

I don't want to install any software, extensions, or pay for any APIs or register any accounts. I'm talking about the most simple and easy-to-express method to get a fair estimate of which future timestamp will correspond to the next record.

Is there some kind of basic mathematical formula for this? The expression should not know anything about what kind of data is in the table, nor care about it, and just use some kind of math such as taking the sum() of all the existing timestamps and dividing them by the position of the moon or something.

Anything which will produce a seemingly sensible estimate. "For entertainment purposes only", rather than life-critical medical decisions.

Best Answer

Besides the point that there is literally a whole branch of mathematics which is only focusing on answering your question, which should give you a hint that there is no "silver bullte" easy formula to use in any case.

WARNING I only have a basic understanding of statistics but I know that you cannot use the mean as a reliable value to predict future values. In almost all, if not all, cases a prediction depends very much on the distribution type of your events. This mean value will only get you a rough estimate which might be better than selecting a random number.

Anyway here is my implementation of a mean timestamp between two events:

Create table eventsTable(
  id int,
  TS timestamp);

Insert into eventsTable VALUES (1, '2020-05-07 08:00');
Insert into eventsTable VALUES (1, '2020-05-07 08:05');
Insert into eventsTable VALUES (1, '2020-05-07 08:10');
Insert into eventsTable VALUES (1, '2020-05-07 08:15');
Insert into eventsTable VALUES (1, '2020-05-07 08:20');
Insert into eventsTable VALUES (1, '2020-05-07 08:25');


SELECT 
  EXTRACT(EPOCH FROM (Max(TS) - Min(TS))) / ( COUNT(*)  -1) as Mean_Number_Of_Seconds_Untill_Next_Event
FROM eventsTable
WHERE TS > '2020-05-07 08:10'

Fiddle link

It just takes the maximum and minimum value and divides it by the number of new events in that time interval. Use the where clause to find a good time interval for performance and prediction correctness.