Postgresql – Find the greatest difference between each unique record with different timestamps

greatest-n-per-grouporder-bypostgresqlrankwindow functions

I have a PostgreSQL table with the following schema and data:

CREATE TABLE IF NOT EXISTS T(
     id uuid PRIMARY KEY,
     username varchar(15),
     person varchar(10),
     tweets int,
     followers int,
     following int,
     likes int,
     created_at date) 
;
 
 id                                   | username     | person     | tweets | followers | following | likes | created_at
 :----------------------------------- | :----------- | :--------- | -----: | --------: | --------: | ----: | :---------
 3fa34100-d688-4051-a687-ec49d05e7212 | renok        | null       |    110 |         6 |         0 |     0 | 2020-10-10
 bab9ceb9-2770-49ea-8489-77e5d763a223 | Lydia_C      | test user2 |    515 |      1301 |      1852 |  1677 | 2020-10-10
 4649077a-9188-4821-a1ec-3b38608ea44a | Kingston_Sav | null       |   2730 |      1087 |      1082 |  1339 | 2020-10-10
 eef80836-e140-4adc-9598-8b612ab1825b | TP_s         | null       |   1835 |       998 |       956 |  1832 | 2020-10-10
 fd3ff8c7-0994-40b6-abe0-915368ab9ae5 | DKSnr4       | null       |    580 |       268 |       705 |   703 | 2020-10-10
 3fa34100-d688-4051-a687-ec49d05e7312 | renok        | null       |    119 |         6 |         0 |     0 | 2020-10-12
 bab9ceb9-2770-49ea-8489-77e5d763a224 | Lydia_C      | test user2 |    516 |      1301 |      1852 |  1687 | 2020-10-12
 4649077a-9188-4821-a1ec-3b38608ea44B | Kingston_Sav | null       |   2737 |      1090 |      1084 |  1342 | 2020-10-12
 eef80836-e140-4adc-9598-8b612ae1835c | TP_s         | null       |   1833 |       998 |       957 |  1837 | 2020-10-12
 fd3ff8c7-0994-40b6-abe0-915368ab7ab5 | DKSnr4       | null       |    570 |       268 |       700 |   703 | 2020-10-12

I intend to get the biggest difference between the most recent date and the next most recent date for each unique username and the find the username with the largest margin (difference) for example..In the above table the most recent date is 2020-10-12 and the next most recent date is 2020-10-10.

So I want to get something like this

 id                                   | username     | person | tweets | followers | following | likes | created_at | prev_followers | gain
 :----------------------------------- | :----------- | :----- | -----: | --------: | --------: | ----: | :--------- | -------------: | ---:
 4649077a-9188-4821-a1ec-3b38608ea44a | Kingston_Sav | null   |   2737 |      1090 |      1084 |  1342 | 2020-10-12 |           1087 |    3

Best Answer

Many ways lead to Rome. The below should be a good one (fast and flexible) to "find the username with the largest margin"

Assuming all involved columns are defined NOT NULL. And each username can only have one entry per day. Else you have to do more.

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
        , d1.followers, username  -- added tiebreaker
LIMIT  1;

The CTE named cte attaches rank numbers with dense_rank() (not rank(), not row_number()). Then join the latest day (rnk = 1) with the one before (rnk = 2) and calculate the gain. Obviously, users must have entries for both days to qualify. Finally order by the gain and take the first row.

Note the added ORDER BY expressions to try and break possible ties: there can be multiple users with the same gain, so you have to define how to deal with that. One way is to add tiebreakes. In my example, a user with a smaller absolute number of followers is preferred (higher relative gain), and the alphabetically first wins if that's still ambiguous.

Or you return all "winners":

Again, many ways ... Postgres 13 added the standard SQL clause WITH TIES for that purpose exactly:

WITH cte AS (
   SELECT *, dense_rank() OVER (ORDER BY created_at DESC) AS rnk
   FROM   tbl
   )
SELECT d1.*
     , d2.followers AS prev_followers
     , d1.followers - d2.followers AS gain
FROM  (SELECT * FROM cte WHERE rnk = 1) d1
JOIN  (SELECT * FROM cte WHERE rnk = 2) d2 USING (username)
ORDER  BY gain DESC
FETCH  FIRST 1 ROWS WITH TIES;

db<>fiddle here

Detailed explanation for WITH TIES: