Postgresql – Return results based on total days computed

postgresql

I'll give a background on how is organized our tables that are involved in the problem I want to solve. We have the following structure in our database:

-- In STAFF table we save info about our employees
-- such as their staff_id, name, surname, email, etc

+---------------------------+
|           staff           |
+----------+------+---------+
| staff_id | name | surname |
+----------+------+---------+
|   1234   | Jose |   Ruz   |
|   ....   | .... |   ...   |
+----------+------+---------+


-- In PLAN table we save data that tells at which project is a staff
-- working on in the fortnight, we store the project id (another table
-- but not of interest in this query), the staff associated with the
-- project, hours planned to work and date (fortnight) of this plan

-- In the example below, staff with id 1234 (Jose) must work 88 total
-- hours in the second fortnight of march (2018-3-16 to 2018-3-31) on two
-- projects

+-------------------------------------------+
|                    plan                   |
+------------+----------+-------+-----------+
| project_id | staff_id | hours |    date   |
+------------+----------+-------+-----------+
|    9999    |   1234   |   44  | 2018-3-30 |
|    9998    |   1234   |   44  | 2018-3-30 |
|    ....    |   ....   |   ..  |    ...    |
+------------+----------+-------+-----------+


-- And the last table involved is REPORT_HC, where the staff save information
-- about their work to the projects; they must report (save data in this table)
-- everyday, so, for example:

+-----------------------------------------------------------+
|                           report_hc                       |
+------------+----------+-------+---------------------------+
| project_id | staff_id | hours |           date            |
+------------+----------+-------+---------------------------+
|    9999    |   1234   |   6   | 2018-3-16 14:15:41.487-04 |
|    9998    |   1234   |   2   | 2018-3-16 18:44:34.487-04 |
|    9998    |   1234   |   8   | 2018-3-19 08:11:05.930-04 |
|    ....    |   ....   |  ...  |           .....           |
+------------+----------+-------+---------------------------+

-- project_id column in report_hc doesn't matter here but I put it
-- to give a better understanding

This is the query that I'm trying:

-- Generate days workables in the fortnight
WITH days_workables AS (
  SELECT  COUNT(*)  AS total
  FROM  GENERATE_SERIES(timestamp '2018-3-16', timestamp '2018-3-31', interval '1 day') AS days

  -- Get only week days (monday, tuesday, ..., friday)
  WHERE EXTRACT('ISODOW' FROM days) < 6
),

calc AS (
  SELECT  hc.date::date                             AS date_reference,
          SUM(hc.hours)                             AS hours_reported_per_day,
          TO_CHAR(AVG(hc.date::time), 'HH12:MI AM') AS average_time_per_day,

          st.staff_id                   AS staff_id,
          st.surname || ', ' || st.name AS staff_name,

          -- Depending on the hours planned and the available days
          -- to work in the fortnight, each staff have an average of
          -- hours that needs to report every workable day, for example,
          -- if a user has 88 hours planned and there are 11 days workables
          -- he must report 8 hours each day
          --
          -- We use COALESCE to force 0 in SUM instead of possible NULL 
          -- and NULLIF to avoid division by zero 
          (
            SELECT  COALESCE(SUM(pl.hours), 0) AS hours
            FROM  v1.plan AS pl
            WHERE pl.date BETWEEN '2018-3-16' AND '2018-3-31'
              AND pl.staff_id = st.staff_id
          )
          /
          NULLIF((
            SELECT  total
            FROM  days_workables
          ), 0) AS hours_to_report_per_day,

          -- Ok, once we have the hours_to_report_per_day we must
          -- calculate if the user exceeded or reported less hours
          -- than the expressed this day, we calculate it using a
          -- percentage form, so for example, if the user should report
          -- 8 hours but instead he did 5 the percentage will be -37.5
          -- taking 8 hours as a 100%
          ROUND(((
            (SUM(hc.hours) * 100.0)
            /
            NULLIF(((
              SELECT  COALESCE(SUM(pl.hours), 0) AS hours 
              FROM  v1.plan AS pl 
              WHERE pl.date BETWEEN '2018-3-16' AND '2018-3-31' 
                AND pl.staff_id = st.staff_id
            ) / NULLIF((
              SELECT  * 
              FROM  days_workables
            ), 0)), 0)
          ) - 100.0), 2) AS percentage_of_report_achieved
FROM      v1.staff      AS st
LEFT JOIN v1.report_hc  AS hc ON (hc.staff_id = st.staff_id AND hc.date BETWEEN '2018-3-16' AND '2018-3-31')

WHERE st.staff_id IN (
  SELECT DISTINCT staff_id
  FROM  v1.plan
  WHERE date BETWEEN '2018-3-16' AND '2018-3-31'
)
GROUP BY  hc.date::date,
          st.staff_id,
          st.surname,
          st.name
)

-- We use non-descriptive alias only to get a 'nicely-view' result
SELECT  date_reference AS a,
        average_time_per_day AS g,
        staff_id AS b,
        staff_name AS c,
        hours_to_report_per_day AS d,
        hours_reported_per_day AS e,
        percentage_of_report_achieved AS f,
        SUM(percentage_of_report_achieved) OVER w AS h
FROM  calc

-- We create a partition to sum at the last each value get from percentage_of_report_achieved
WINDOW w AS (PARTITION BY staff_id ORDER BY fecha_reporte)
ORDER BY  nombre ASC,
          fecha_reporte ASC

This is part of the result I get:

     a      |    g     |   b   |                  c                   | d  | e  |   f    |    h    
------------+----------+-------+--------------------------------------+----+----+--------+---------
 2018-03-16 | 07:23 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-19 | 06:22 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-20 | 08:45 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-21 | 07:15 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-22 | 02:51 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-23 | 07:29 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-26 | 11:43 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-27 | 08:45 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  8 |   0.00 |    0.00
 2018-03-28 | 08:40 AM | 12879 | Andrade McCann, Wendy Daniele        |  8 |  5 | -37.50 |  -37.50
 2018-03-16 | 10:26 AM | 12855 | Aarons Wujcik, Peter Charles         |  6 |  8 |  33.33 |   33.33
 2018-03-19 | 08:42 AM | 12855 | Aarons Wujcik, Peter Charles         |  6 |  8 |  33.33 |   66.66
 2018-03-20 | 09:11 AM | 12855 | Aarons Wujcik, Peter Charles         |  6 |  8 |  33.33 |   99.99
 2018-03-23 | 07:14 AM | 12855 | Aarons Wujcik, Peter Charles         |  6 |  8 |  33.33 |  133.32
 2018-03-26 | 11:17 AM | 12855 | Aarons Wujcik, Peter Charles         |  6 |  8 |  33.33 |  166.65
 2018-03-27 | 08:13 AM | 12855 | Aarons Wujcik, Peter Charles         |  6 |  8 |  33.33 |  199.98
 2018-03-16 | 09:33 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |   33.33
 2018-03-19 | 10:23 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |   66.66
 2018-03-20 | 08:21 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |   99.99
 2018-03-21 | 08:30 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |  133.32
 2018-03-22 | 07:52 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |  166.65
 2018-03-23 | 08:17 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |  199.98
 2018-03-26 | 11:08 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |  233.31
 2018-03-27 | 07:40 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |  266.64
 2018-03-28 | 08:31 AM | 13511 | Aray Cappello, Janet Andrea          |  6 |  8 |  33.33 |  299.97
            |          | 13536 | Lascody Hernández, Christian Gabriel |  6 |    |        |        
 2018-03-16 | 10:30 AM | 12670 | Astudillo Méndez, Juan Ernesto       |  6 |  8 |  33.33 |   33.33
 2018-03-19 | 08:22 AM | 12670 | Campanaro Méndez, Juan Ernesto       |  6 |  8 |  33.33 |   66.66
 2018-03-20 | 10:02 AM | 12670 | Campanaro Méndez, Juan Ernesto       |  6 |  8 |  33.33 |   99.99
 2018-03-22 | 10:41 AM | 12670 | Campanaro Méndez, Juan Ernesto       |  6 |  8 |  33.33 |  133.32
 2018-03-26 | 11:17 AM | 12670 | Campanaro Méndez, Juan Ernesto       |  6 |  8 |  33.33 |  166.65
 2018-03-28 | 09:58 AM | 12670 | Campanaro Méndez, Juan Ernesto       |  6 |  6 |   0.00 |  166.65

But there are many problems, take a look at Christian Gabriel, he was planned that fortnight for 6 hours to report every day, but he didn't report anything, I need to be able to see by each day (2018-03-16, 2018-03-19, 2018-03-20, 2018-03-21, etc) 6 hours in the column d (hours_to_report_per_day) and 0 (because he didn't report anything) in the e column (hours_reported_per_day).

In the same manner, Peter Charles, for example, didn't report at 21 and 22 of march, but that result doesn't appear and to see 6 in d column and 0 in e column.

Any help on how can I get the right result? Tell me if you don't understand very well my explanation and I'll do my best to improve it. Thanks.

Best Answer

first create the list of days:

WITH days AS (
   SELECT  COUNT(*)  AS total
   FROM  GENERATE_SERIES(date '2018-3-16', date '2018-3-31',  interval '1 day') AS day

-- Get only week days (monday, tuesday, ..., friday)  
    WHERE EXTRACT('ISODOW' FROM day) < 6
),

-- then from that days_workables

 days_workables AS (
  SELECT  COUNT(*)  AS total FROM days
),

-- then finally cross join days

calc AS ( 
  SELECT  days.day AS date_reference,
          SUM(hc.hours)                             AS hours_reported_per_day,
          TO_CHAR(AVG(hc.date::time), 'HH12:MI AM') AS average_time_per_day,

          st.staff_id                   AS staff_id,
          st.surname || ', ' || st.name AS staff_name,
--  ...
FROM      v1.staff      AS st
,days 
LEFT JOIN v1.report_hc  AS hc ON (hc.staff_id = st.staff_id AND hc.date BETWEEN '2018-3-16' AND '2018-3-31') 
   and days.day=hc.date::date
),