Postgresql – Get the combined row count by month from 3 tables

countjoin;postgresqlunion

Using Postgres 10+, I have three tables with similar structures, where I'm trying to get a count of rows grouped by months. I have a functional query working for two tables, as below:

CREATE TABLE classroom_observations (id SERIAL, created_at TIMESTAMPTZ);
CREATE TABLE training_modules       (id SERIAL, created_at TIMESTAMPTZ);
CREATE TABLE teachers_workshops     (id SERIAL, created_at TIMESTAMPTZ);

INSERT INTO classroom_observations VALUES
 (1, '2019-04-20 10:36:06+02')
,(2, '2019-05-22 15:22:33+02')
,(3, '2019-05-23 15:22:33+02')
,(4, '2019-05-24 15:22:33+02');
INSERT INTO training_modules VALUES
 (1, '2019-03-20 10:36:06+02')
,(2, '2019-04-22 15:22:33+02')
,(3, '2019-04-23 15:22:33+02')
,(4, '2019-05-24 15:22:33+02');
INSERT INTO teachers_workshops VALUES
 (1, '2019-03-20 10:36:06+02');
SELECT * FROM
  (
      SELECT to_char(co.created_at, 'YYYY-MM') AS month
           , COUNT(co.id) AS co_count
           , COUNT(tm.id) AS tm_count
        FROM classroom_observations co
       LEFT JOIN training_modules tm
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tm.created_at, 'YYYY-MM')
       GROUP BY to_char(co.created_at, 'YYYY-MM'), to_char(tm.created_at, 'YYYY-MM')
      UNION
      SELECT to_char(tm.created_at, 'YYYY-MM') AS month
           , COUNT(co.id) AS co_count
           , COUNT(tm.id) AS tm_count
           FROM classroom_observations co
           RIGHT JOIN training_modules tm
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tm.created_at, 'YYYY-MM')
       GROUP BY to_char(co.created_at, 'YYYY-MM'), to_char(tm.created_at, 'YYYY-MM')
      ) tbl
  ORDER BY Month;

This produces the expected result:

  month  | co_count | tm_count 
---------+----------+----------
 2019-03 |        0 |        1
 2019-04 |        1 |        2 
 2019-05 |        3 |        1

When I try to add the third table teachers_workshops, with count a count of tw.id as tw_count, I get the wrong result:

SELECT * FROM
  (
      SELECT to_char(co.created_at, 'YYYY-MM') AS month
           , COUNT(co.id) AS co_count
           , COUNT(tm.id) AS tm_count
           , COUNT(tw.id) AS tw_count
        FROM classroom_observations co
       LEFT JOIN training_modules tm
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tm.created_at, 'YYYY-MM')
       LEFT JOIN teachers_workshops tw
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tw.created_at, 'YYYY-MM')
       GROUP BY to_char(co.created_at, 'YYYY-MM'), to_char(tm.created_at, 'YYYY-MM') , to_char(tw.created_at, 'YYYY-MM')
      UNION
      SELECT to_char(tm.created_at, 'YYYY-MM') AS month
           , COUNT(co.id) AS co_count
           , COUNT(tm.id) AS tm_count
           , COUNT(tw.id) AS tw_count
           FROM classroom_observations co
           RIGHT JOIN training_modules tm
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tm.created_at, 'YYYY-MM')
           RIGHT JOIN teachers_workshops tw
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tw.created_at, 'YYYY-MM')
       GROUP BY to_char(co.created_at, 'YYYY-MM'), to_char(tm.created_at, 'YYYY-MM'), to_char(tw.created_at, 'YYYY-MM')
      UNION
      SELECT to_char(tm.created_at, 'YYYY-MM') AS month
           , COUNT(co.id) AS co_count
           , COUNT(tm.id) AS tm_count
           , COUNT(tw.id) AS tw_count
           FROM classroom_observations co
           RIGHT JOIN training_modules tm
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tm.created_at, 'YYYY-MM')
           RIGHT JOIN teachers_workshops tw
          ON to_char(co.created_at, 'YYYY-MM') = to_char(tw.created_at, 'YYYY-MM')
       GROUP BY to_char(co.created_at, 'YYYY-MM'), to_char(tm.created_at, 'YYYY-MM'), to_char(tw.created_at, 'YYYY-MM')
      ) tbl
  ORDER BY Month;

Bad Result:

  month  | co_count | tm_count | tw_count 
---------+----------+----------+----------
 2019-04 |        1 |        0 |        0
 2019-05 |        6 |        6 |        6

There should only be 1 record for tw_count, and there should be another month included 2019-03. I'm clearly not wrapping my head around this properly, so any help would be greatly appreciated.

Best Answer

I'm trying to get a count of rows grouped by months

The queries you show do something potentially quite different. To achieve your declared objective:

SELECT to_char(mon, 'YYYY-MM') AS month
     , COALESCE(co.ct, 0) AS co_count
     , COALESCE(tm.ct, 0) AS tm_count
     , COALESCE(tw.ct, 0) AS tw_count
FROM  (
   SELECT date_trunc('month', created_at) AS mon, count(*) AS ct
   FROM   classroom_observations
   GROUP  BY mon
   ) co
FULL JOIN (
   SELECT date_trunc('month', created_at) AS mon, count(*) AS ct
   FROM   training_modules
   GROUP  BY mon
   ) tm USING (mon)
FULL JOIN (
   SELECT date_trunc('month', created_at) AS mon, count(*) AS ct
   FROM   teachers_workshops
   GROUP  BY mon
   ) tw USING (mon)
ORDER  BY mon;

Result:

  month  | co_count | tm_count | tw_count 
---------+----------+----------+----------
 2019-03 |        0 |        1 |        1
 2019-04 |        1 |        2 |        0
 2019-05 |        3 |        1 |        0

db<>fiddle here

  • FULL [OUTER] JOIN keeps all rows, even without match on the other side.

  • Results in NULL values where no rows are found - which I replace with 0 using COALESCE as an optional addition.

  • date_trunc() is faster and more reliable than to_char(). Aggregate and join on the truncated timestamp and only format once with to_char() in the outer SELECT.