Postgresql – Combine a generated series with each row from a select

join;postgresql

The below select is returning an array of all the projects owned by all my users.

WITH user_projects AS
(SELECT
  u.id as user_id
  ,COALESCE(array_agg(DISTINCT pa.project_id), '{0}'::integer[]) as project_ids
FROM users u
LEFT JOIN project_assignments pa on pa.user_creator_id = u.id
GROUP BY u.id
)

user: 1, project_ids: {1, 2}
user: 2, project_ids: {3, 4}

Using that select I would like to proceed a report, which will display for each user a 12 month summary.

SELECT
  to_char(pk.created_at,'Mon') as mon
 ,extract(year from pk.created_at) as yyyy
 ,up.user_id
 ,COALESCE(count(distinct pk.keyword_id), 0) as total_keywords
FROM user_projects up
LEFT JOIN project_keywords pk on pk.project_id = ANY(up.project_ids::int[])
GROUP BY up.user_id, up.project_ids, 1, 2

Output:

 mon: sept, yyyy: 2014, user_id: 1, total_keywords: 10,
 mon: , yyyy: , user_id: 2, total_keywords: 0,

The select from above is doing what I need but is not displaying all 12 months, only months for which it can find data in project_keywords and for users which don't have any data into project_keywords it will display 0 and the month and year columns will be empty.

So how can I produce a report for each user displaying the last 12 months since now(), and where no data is available fill the month and year columns?

I believe in order to do that I need to have a generate_series which will return the last 12 months so I did this:

(select created_at from generate_series(now()::date - INTERVAL '1 year',now()::date,'1 month') as created_at)

But now I'm stuck and I don't know how to combine all of this.

The expected output should be

For user 1:

mon: sept, yyyy: 2013, user_id: 1, total_keywords: 10,
mon: oct, yyyy: 2013, user_id: 1, total_keywords: 11
...
mon: aug, yyyy: 2014, user_id: 1, total_keywords: 0
mon: sept, yyyy: 2014, user_id: 1, total_keywords: 2

For user 2:

mon: sept, yyyy: 2013, user_id: 2, total_keywords: 10,
mon: oct, yyyy: 2013, user_id: 2, total_keywords: 0,
...
mon: aug, yyyy: 2014, user_id: 2, total_keywords: 0,
mon: sept, yyyy: 2014, user_id: 2, total_keywords: 0

http://sqlfiddle.com/#!15/8c911/6

Using PostgreSQL 9.3.

Best Answer

You need to do a left outer join on the generate_series with an ON clause that matches the date of each report.

Without a schema or sample data it's a bit fiddly to cook up an example modification of your actual query.

Here's a simplified example to show you how it works:

CREATE TABLE sparse_dates(
    bakedgood text primary key,
    firstbaked date not null
);

INSERT INTO sparse_dates (bakedgood, firstbaked) VALUES
('tart', '2012-02-01'),
('baguette', '2012-02-01'),
('cookie', '2012-03-01'),
('macaron', '2012-08-01');

To get the number of baked goods first baked on a given month:

SELECT monthtimestamp, count(bakedgood) 
FROM generate_series( 
    (SELECT min(firstbaked) FROM sparse_dates),
    (SELECT max(firstbaked) FROM sparse_dates),
    INTERVAL '1' MONTH) 
    AS monthtimestamp
LEFT OUTER JOIN sparse_dates ON (monthtimestamp = firstbaked)
GROUP BY monthtimestamp;

with output like:

     monthtimestamp     | count 
------------------------+-------
 2012-02-01 00:00:00+08 |     2
 2012-03-01 00:00:00+08 |     1
 2012-04-01 00:00:00+08 |     0
 2012-05-01 00:00:00+08 |     0
 2012-06-01 00:00:00+08 |     0
 2012-07-01 00:00:00+08 |     0
 2012-08-01 00:00:00+08 |     1
(7 rows)