I'll give a background on how is organized our tables that are involved in the problem I want to solve. We have the following structure in our database:
-- In STAFF table we save info about our employees
-- such as their staff_id, name, surname, email, etc
+---------------------------+
| staff |
+----------+------+---------+
| staff_id | name | surname |
+----------+------+---------+
| 1234 | Jose | Ruz |
| .... | .... | ... |
+----------+------+---------+
-- In PLAN table we save data that tells at which project is a staff
-- working on in the fortnight, we store the project id (another table
-- but not of interest in this query), the staff associated with the
-- project, hours planned to work and date (fortnight) of this plan
-- In the example below, staff with id 1234 (Jose) must work 88 total
-- hours in the second fortnight of march (2018-3-16 to 2018-3-31) on two
-- projects
+-------------------------------------------+
| plan |
+------------+----------+-------+-----------+
| project_id | staff_id | hours | date |
+------------+----------+-------+-----------+
| 9999 | 1234 | 44 | 2018-3-30 |
| 9998 | 1234 | 44 | 2018-3-30 |
| .... | .... | .. | ... |
+------------+----------+-------+-----------+
-- And the last table involved is REPORT_HC, where the staff save information
-- about their work to the projects; they must report (save data in this table)
-- everyday, so, for example:
+-----------------------------------------------------------+
| report_hc |
+------------+----------+-------+---------------------------+
| project_id | staff_id | hours | date |
+------------+----------+-------+---------------------------+
| 9999 | 1234 | 6 | 2018-3-16 14:15:41.487-04 |
| 9998 | 1234 | 2 | 2018-3-16 18:44:34.487-04 |
| 9998 | 1234 | 8 | 2018-3-19 08:11:05.930-04 |
| .... | .... | ... | ..... |
+------------+----------+-------+---------------------------+
-- project_id column in report_hc doesn't matter here but I put it
-- to give a better understanding
This is the query that I'm trying:
-- Generate days workables in the fortnight
WITH days_workables AS (
SELECT COUNT(*) AS total
FROM GENERATE_SERIES(timestamp '2018-3-16', timestamp '2018-3-31', interval '1 day') AS days
-- Get only week days (monday, tuesday, ..., friday)
WHERE EXTRACT('ISODOW' FROM days) < 6
),
calc AS (
SELECT hc.date::date AS date_reference,
SUM(hc.hours) AS hours_reported_per_day,
TO_CHAR(AVG(hc.date::time), 'HH12:MI AM') AS average_time_per_day,
st.staff_id AS staff_id,
st.surname || ', ' || st.name AS staff_name,
-- Depending on the hours planned and the available days
-- to work in the fortnight, each staff have an average of
-- hours that needs to report every workable day, for example,
-- if a user has 88 hours planned and there are 11 days workables
-- he must report 8 hours each day
--
-- We use COALESCE to force 0 in SUM instead of possible NULL
-- and NULLIF to avoid division by zero
(
SELECT COALESCE(SUM(pl.hours), 0) AS hours
FROM v1.plan AS pl
WHERE pl.date BETWEEN '2018-3-16' AND '2018-3-31'
AND pl.staff_id = st.staff_id
)
/
NULLIF((
SELECT total
FROM days_workables
), 0) AS hours_to_report_per_day,
-- Ok, once we have the hours_to_report_per_day we must
-- calculate if the user exceeded or reported less hours
-- than the expressed this day, we calculate it using a
-- percentage form, so for example, if the user should report
-- 8 hours but instead he did 5 the percentage will be -37.5
-- taking 8 hours as a 100%
ROUND(((
(SUM(hc.hours) * 100.0)
/
NULLIF(((
SELECT COALESCE(SUM(pl.hours), 0) AS hours
FROM v1.plan AS pl
WHERE pl.date BETWEEN '2018-3-16' AND '2018-3-31'
AND pl.staff_id = st.staff_id
) / NULLIF((
SELECT *
FROM days_workables
), 0)), 0)
) - 100.0), 2) AS percentage_of_report_achieved
FROM v1.staff AS st
LEFT JOIN v1.report_hc AS hc ON (hc.staff_id = st.staff_id AND hc.date BETWEEN '2018-3-16' AND '2018-3-31')
WHERE st.staff_id IN (
SELECT DISTINCT staff_id
FROM v1.plan
WHERE date BETWEEN '2018-3-16' AND '2018-3-31'
)
GROUP BY hc.date::date,
st.staff_id,
st.surname,
st.name
)
-- We use non-descriptive alias only to get a 'nicely-view' result
SELECT date_reference AS a,
average_time_per_day AS g,
staff_id AS b,
staff_name AS c,
hours_to_report_per_day AS d,
hours_reported_per_day AS e,
percentage_of_report_achieved AS f,
SUM(percentage_of_report_achieved) OVER w AS h
FROM calc
-- We create a partition to sum at the last each value get from percentage_of_report_achieved
WINDOW w AS (PARTITION BY staff_id ORDER BY fecha_reporte)
ORDER BY nombre ASC,
fecha_reporte ASC
This is part of the result I get:
a | g | b | c | d | e | f | h
------------+----------+-------+--------------------------------------+----+----+--------+---------
2018-03-16 | 07:23 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-19 | 06:22 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-20 | 08:45 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-21 | 07:15 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-22 | 02:51 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-23 | 07:29 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-26 | 11:43 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-27 | 08:45 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 8 | 0.00 | 0.00
2018-03-28 | 08:40 AM | 12879 | Andrade McCann, Wendy Daniele | 8 | 5 | -37.50 | -37.50
2018-03-16 | 10:26 AM | 12855 | Aarons Wujcik, Peter Charles | 6 | 8 | 33.33 | 33.33
2018-03-19 | 08:42 AM | 12855 | Aarons Wujcik, Peter Charles | 6 | 8 | 33.33 | 66.66
2018-03-20 | 09:11 AM | 12855 | Aarons Wujcik, Peter Charles | 6 | 8 | 33.33 | 99.99
2018-03-23 | 07:14 AM | 12855 | Aarons Wujcik, Peter Charles | 6 | 8 | 33.33 | 133.32
2018-03-26 | 11:17 AM | 12855 | Aarons Wujcik, Peter Charles | 6 | 8 | 33.33 | 166.65
2018-03-27 | 08:13 AM | 12855 | Aarons Wujcik, Peter Charles | 6 | 8 | 33.33 | 199.98
2018-03-16 | 09:33 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 33.33
2018-03-19 | 10:23 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 66.66
2018-03-20 | 08:21 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 99.99
2018-03-21 | 08:30 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 133.32
2018-03-22 | 07:52 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 166.65
2018-03-23 | 08:17 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 199.98
2018-03-26 | 11:08 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 233.31
2018-03-27 | 07:40 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 266.64
2018-03-28 | 08:31 AM | 13511 | Aray Cappello, Janet Andrea | 6 | 8 | 33.33 | 299.97
| | 13536 | Lascody Hernández, Christian Gabriel | 6 | | |
2018-03-16 | 10:30 AM | 12670 | Astudillo Méndez, Juan Ernesto | 6 | 8 | 33.33 | 33.33
2018-03-19 | 08:22 AM | 12670 | Campanaro Méndez, Juan Ernesto | 6 | 8 | 33.33 | 66.66
2018-03-20 | 10:02 AM | 12670 | Campanaro Méndez, Juan Ernesto | 6 | 8 | 33.33 | 99.99
2018-03-22 | 10:41 AM | 12670 | Campanaro Méndez, Juan Ernesto | 6 | 8 | 33.33 | 133.32
2018-03-26 | 11:17 AM | 12670 | Campanaro Méndez, Juan Ernesto | 6 | 8 | 33.33 | 166.65
2018-03-28 | 09:58 AM | 12670 | Campanaro Méndez, Juan Ernesto | 6 | 6 | 0.00 | 166.65
But there are many problems, take a look at Christian Gabriel, he was planned that fortnight for 6 hours to report every day, but he didn't report anything, I need to be able to see by each day (2018-03-16, 2018-03-19, 2018-03-20, 2018-03-21, etc) 6 hours in the column d (hours_to_report_per_day) and 0 (because he didn't report anything) in the e column (hours_reported_per_day).
In the same manner, Peter Charles, for example, didn't report at 21 and 22 of march, but that result doesn't appear and to see 6 in d column and 0 in e column.
Any help on how can I get the right result? Tell me if you don't understand very well my explanation and I'll do my best to improve it. Thanks.
Best Answer
first create the list of days:
-- then from that days_workables
-- then finally cross join days