Postgresql – Join and count boolean columns by logical OR

aggregatepostgresql

There are two tables: user_table and user_schedule. Each user has four schedule rows (one for each 6 hour period per day) with one column for every workday.

The desired result is to count the number of users that are present for at least one period per day of the work week. For example, if a user is present two quarters of a day, that counts as one.

This is what I thought would work for a Monday, but it's incorrect:

SELECT 
 count(CASE WHEN TRUE = ANY(ARRAY_AGG(mon)) THEN 1 ELSE null END) as mon,
FROM user_table 
LEFT JOIN schedule_table ON user_table.id = schedule_table.user_id
GROUP BY user_id

I created a SQLFiddle to demonstrate.

EDIT:

I was looking for an summary table like this

+-----+-----+-----+-----+-----+
| Mon | Tue | Wed | Thu | Fri |
+-----+-----+-----+-----+-----+
|   4 |   3 |   4 |   4 |   4 |
+-----+-----+-----+-----+-----+

I am running PostgreSQL 9.3

Here are the table definitions I am using:

CREATE TABLE user_table
(
  id integer,
  start_dt date
);


CREATE TABLE schedule_table
(
  id integer,
  user_id integer,
  mon boolean,
  tue boolean,
  wed boolean,
  thu boolean,
  fri boolean
);

Best Answer

All users present on Monday:

SELECT u.*
FROM  (
   SELECT user_id
   FROM   schedule_table 
   GROUP  BY 1
   HAVING bool_or(mon)
   ) s
JOIN   user_table u ON u.id = s.user_id;

The aggregate function bool_or() comes in handy.

Summary table:

SELECT count(mon OR NULL) AS mon
     , count(tue OR NULL) AS tue
     , count(wed OR NULL) AS wed
     , count(thu OR NULL) AS thu
     , count(fri OR NULL) AS fri
FROM  (
   SELECT user_id
        , bool_or(mon) AS mon
        , bool_or(tue) AS tue
        , bool_or(wed) AS wed
        , bool_or(thu) AS thu
        , bool_or(fri) AS fri
   FROM   schedule_table 
   GROUP  BY 1
   ) s;

Result as desired:

mon tue wed thu fri
4   3   4   4   4

Note that I didn't join to user_table at all here. It's not needed.

SQL Fiddle.

About counting techniques:


However, your database design is odd, schedule_table has no information to signify the period of the day. And why bundle attributes for the same period on all days in one row?

Either, to store explicit information:

CREATE TABLE schedule (
  schedule_id int PRIMARY KEY
, schedule text NOT NULL
);

INSERT INTO schedule VALUES
  (11, 'mon, period 1')
  (12, 'mon, period 2')
  (13, 'mon, period 3')
  (14, 'mon, period 4')
  (21, 'tue, period 1')
  (22, 'tue, period 2')
  -- etc.
  ;

CREATE TABLE user_schedule (
  user_schedule_id integer,
  user_id int REFERENCES users,
  schedule_id int REFERENCES schedule
);

Only enter rows for periods with presence.
Or, to optimize storage:

CREATE TABLE users (
  user_id serial PRIMARY KEY
, start_dt date
, presence_bitmap bit(20)
);

INSERT INTO users (start_dt, presence_bitmap) VALUES
 ('2015-02-14', '11111111111111111111'),
 ('2015-02-14', '00000000000000001111'); -- etc.

Only needs a fraction of the storage on disk.