I'm trying to generate a table that will show the number of tasks that were completed in each month.
The table has the following columns:
- id
- job_name
- job_status
- date_completed
My goal is to query the number of completed jobs per month between two values (e.g. 2021-02-01 [Feb 2021] to 2022-02-28 [Feb. 2021]), with the condition that it will only count those with status value '1' (1 = complete, 0 = ongoing) with the query producing this table:
Months | Jobs Completed
---------+-----------------------
Feb 2020 | 0
Mar 2020 | 2
Apr 2020 | 0
...
Jan 2022 | 1
---------+------------------------
I managed to generate the series and have values to it, but the said table has the following problems:
- The order of the months are not arranged in ascending order (From Feb 2021 – Jan 2022, the generated series was random)
- The value of the months (the "jobs_completed") copies the highest value, rather than show their own. (e.g. if there are 3 jobs done in March 2021, and 0 on other months, all of the months will register their "jobs_completed" value as 3)
I have made a db-fiddle to show my problem: https://www.db-fiddle.com/f/gH7C7R6udMucjSATpw8Erh/0
SELECT to_char(generate_series('2021-02-01'::DATE, '2022-01-31'::DATE, '1 month'), 'Mon YYYY') AS month,
COUNT(tbl.job_status) FILTER(WHERE job_status = 1) as jobs_completed
FROM monthly_tasks tbl
WHERE tbl.date_completed BETWEEN '2021-01-01' AND '2021-12-01'
GROUP BY month;
Thank you in advance!
Best Answer
To sort by date, the ORDER BY should be applied on the date or timestamp column. If the ORDER BY is applied on the result of to_char(), which is text, then the rows will be sorted alphabetically... and since August starts with an A, it will be first.
So, using the test tables from nbk's answer:
Another, muuuch simpler version using a subquery:
You can fill the table with test data:
...and test query performance with EXPLAIN ANALYZE
First query: 5ms
First query with "date_completed >='2021-02-01' AND date_completed < '2022-01-31'" removed from the JOIN: 45ms
Second query: 1.8ms
Note the query planner isn't smart enough to figure that this returns 12 rows:
but it is smart enough to figure out that this one does:
...probably because the arguments are integers. So the latter may result in faster queries.