Postgresql – How to calculate the cumulative difference between rows in Postgres

postgresqlpostgresql-9.3

I am in need of calculating the difference from the cumulative values for the previous day in Postgres based on session_id. Example as mentioned below:

CREATE TEMP TABLE foo AS
SELECT date::date, session_id, upload_usage, download_usage, total_usage_on_a_day
FROM ( VALUES
  ( '10/21/2014', '0007994b', 37578561   , 6800209   , 44378770 ),
  ( '10/22/2014', '0007994b', 218113296  , 85272007  , 303385303 ),
  ( '10/23/2014', '0007994b', 552228616  , 252390680 , 804619296 ) ,
  ( '10/24/2014', '0007994b', 799772020  , 391196041 , 1190968061 ),
  ( '10/25/2014', '0007994b', 1047233978 , 529908804 , 1577142782 ),
  ( '10/26/2014', '0007994b', 1294608258 , 668515778 , 1963124036 ),
  ( '10/27/2014', '0007994b', 1066656794 , 557318645 , 2573613674 ),
  ( '10/27/2014', '00079e4e', 12949219   , 7265243   , 20214462 ),
  ( '10/28/2014', '00079e4e', 203871297  , 114308478 , 318179775 ),
  ( '10/29/2014', '00079e4e', 445466682  , 251486943 , 696953625 ),
  ( '10/30/2014', '00079e4e', 183499477  , 109643736 , 893143213 )
) AS t( date, session_id, upload_usage, download_usage, total_usage_on_a_day );

Expected Results:

Date        session_id  upload_usage    download_usage  total_usage_on_a_day    Extected_difference
10/21/2014  0007994b    37578561        6800209         44378770                44378770
10/22/2014  0007994b    218113296       85272007        303385303               259006533
10/23/2014  0007994b    552228616       252390680       804619296               501233993
10/24/2014  0007994b    799772020       391196041       1190968061              386348765
10/25/2014  0007994b    1047233978      529908804       1577142782              386174721
10/26/2014  0007994b    1294608258      668515778       1963124036              385981254
10/27/2014  0007994b    1066656794      557318645       2573613674              610489638
10/27/2014  00079e4e    12949219        7265243         20214462                20214462
10/28/2014  00079e4e    203871297       114308478       318179775               297965313
10/29/2014  00079e4e    445466682       251486943       696953625               378773850
10/30/2014  00079e4e    183499477       109643736       893143213               196189588

Basically I have to calculate the usage on daily basis from the total_usage_on_a_day for a particular session.

I am not good with window functions. I tried to find the day difference but how to find data level difference?

select date,sesion_id, SUM(upload_usage)AS UPLOAD,
SUM(download_usage)AS DOWNLOAD,
max(total_usage_on_a_day)AS TOTAL_AS_CUMM,
lag(daTE) over (PARTITION BY sesion_id ORDER BY daTE ASC) as previous_id,
lead(daTE) over (PARTITION BY sesion_id ORDER BY daTE ASC) as present_id
from jiodba.s_crc_zda_mon_conn_usage where GPART= '1100043958' AND zzaccess_ntwk_id = 'FTTH'
GROUP BY sesion_id,date
ORDER BY 1 limit 100;

Best Answer

You can use LAG() function in a query with GROUP BY just like any other query. The only difference is that the columns allowed in the window (OVER) and in the LAG are the ones allowed in SELECT after a GROUP BY:

select 
    date,
    session_id, 
    sum(upload_usage) as upload,
    sum(download_usage) as download,
    sum(total_usage_on_a_day) as total_as_cumm,
    sum(total_usage_on_a_day) 
    - coalesce(lag(sum(total_usage_on_a_day)) over (partition by session_id order by date), 0)
        as expected_difference
 from jiodba.s_crc_zda_mon_conn_usage 
 where gpart = '1100043958' 
   and zzaccess_ntwk_id = 'FTTH' 
 group by session_id, date
 order by session_id, date 
 limit 100 ;
  • I'm not sure why you used max(usage) while you name the column total_usage. I changed it to use sum() instead. If you need to use max(usage) there, pick a more appropriate name for that column.