Postgresql – Cumulative data and dates – the joins aren’t working

join;postgresql

So I want to get cumulative data, for every day since a start date until an end date (or today). I have constructed a query that will get the data from a table, but for some reason, some dates are missing. My left join isn't working and I am not getting the desired results. Can anybody point out my mistake?

(also, as a small detail, how do I only get the date and not the timestamp?)

Current result (note the missing dates):

          date          |   sum    
------------------------+----------
 2012-09-01 00:00:00+02 |    8.000
 2012-09-03 00:00:00+02 |   29.300
 2012-09-07 00:00:00+02 |   31.300

Expected result (a missing date behaves as distance 0, so sum shouldn't increase):

          date          |   sum    
------------------------+----------
 2012-09-01 00:00:00+02 |    8.000
 2012-09-02 00:00:00+02 |    8.000
 2012-09-03 00:00:00+02 |   29.300
 2012-09-04 00:00:00+02 |   29.300
 2012-09-05 00:00:00+02 |   29.300
 2012-09-06 00:00:00+02 |   29.300
 2012-09-07 00:00:00+02 |   31.300

Query:

WITH dates  as (SELECT min(date) as start_date,
                       max(date) as end_date
                from training_training
                where athlete_id = 1)

SELECT  distinct(d.date),
        sum(distance) OVER (ORDER BY d.date)
FROM
      (
       SELECT generate_series(start_date, end_date, interval '1 day') as date
       FROM dates
      ) d
LEFT JOIN training_training
     ON         d.date = training_training.date
     WHERE athlete_id  =  1
     AND   kind        in ('t', 'd', 'i', 'w')

GROUP BY d.date, distance
ORDER BY d.date

Best Answer

As inspired by a_horse_with_no_name, I moved the right part of the join to a subquery so the where clause doesn't interfere with the join condition

This works as intended:

WITH dates  as (SELECT min(date) as start_date,
                       max(date) as end_date
                from training_training
                where athlete_id = 1)

SELECT  distinct(d.date),
   sum(distance) OVER (ORDER BY d.date)
FROM
  (
    SELECT generate_series(start_date, end_date, interval '1 day') as date
    FROM dates
  ) d
LEFT JOIN
  (
    SELECT
      date,
      distance
    FROM training_training
    WHERE athlete_id = 1
          AND kind IN ('t', 'd', 'i', 'w')
  ) t
ON d.date = t.date

GROUP BY d.date, distance
ORDER BY d.date