Postgresql – COUNT beginning of, mid of & end per month for many years

datepostgresql

In my last post PostgreSQL: How to get a weekly average over a year including 'gap' weeks per animal from years of data? with the fiddle http://sqlfiddle.com/#!17/0bff3/1
I got a weekly count. But it was difficult to graph.

I decided to go a different route. doing the month is 3 parts (beginning, middle and end)
My new objective: (and I won't graph this)

|animal|start             |peak      |end        |
|dog   |beginning of may  |mid of may|end of june|
|cat   |beginning of march|end of may|mid of june|

I was thinking of using a case like structure to assign a part of the month… something like the following…

SELECT animal, count(*), 
min(cast(concat(date_part('month',from_date),'.',date_part('day',from_date)) as numeric)) as start, 
max(cast(concat(date_part('month',from_date),'.',date_part('day',from_date)) as numeric)) as end
    FROM brachyopa_cleaned bc
    AND concat(from_month,'/',from_day) not in ('12/31','01/01','1/1') -- dates where month and day are unknown
GROUP BY animal
ORDER BY aminal

… and then find per species per month section the amount of occurrences to find the month section (there are in fact 12×3 'mm-dd' sections in a year) with the highest count.

Can someone give me some pointers how to do this?

[EDIT1: Changed the SELECT in what I am using so far, now I need somehow to find the peak, the 10 days period where there is the max count]
[EDIT2: rewrote the …per specie per month… section]

Best Answer

One thing I can think of, is to generate a date range from the input date that splits the month into three parts. It's quite easy to keep the first two parts constant and have the third one vary:

create function get_first(p_input date)
  returns daterange
as
$$
  select daterange(date_trunc('month', p_input)::date, date_trunc('month', p_input)::date + 10, '[]')
$$
language sql
immutable;

create function get_second(p_input date)
  returns daterange
as
$$
  select daterange(date_trunc('month', p_input)::date + 11, date_trunc('month', p_input)::date + 20, '[]')
$$
language sql
immutable;

create function get_third(p_input date)
  returns daterange
as
$$
  select daterange(date_trunc('month', p_input)::date + 21, (date_trunc('month', p_input) + interval '1 month')::date, '[)')
$$
language sql
immutable;

With that you can do the following aggregation:

select animal,
       count(*) filter (where get_first(from_date) @> from_date) as start,
       count(*) filter (where get_second(from_date) @> from_date) as mid,
       count(*) filter (where get_third(from_date) @> from_date) as "end"
from occurrence
group by animal;

The three functions can be enhanced/changed to deal with the variable length of a month, but that would require much more complex calculations. It would boil down to making the +1 or +20 a percentage of the number of days in the month.

If you need to find the highest of those three parts, you can unpivot the values and then using distinct on() to get it:

select distinct on (x.animal) x.*
from (
  select animal,
         to_char(from_date, 'yyyy-mm') as month,
         count(*) filter (where get_first(from_date) @> from_date) as start,
         count(*) filter (where get_second(from_date) @> from_date) as mid,
         count(*) filter (where get_third(from_date) @> from_date) as "end"
  from occurrence
  group by animal, month
) t
  cross join lateral (
    values (animal, start, month, 'start'), (animal, mid, month, 'mid'), (animal, "end", month, 'end')
  ) x (animal, num, month, section) --on true 
order by x.animal, num desc

The cross join lateral essentially turns the three columns into rows and then distinct on () is used to pick the highest value of the three for each animal

Online example

Related Solutions

Custom calendar for calculating workdays per month

I'd start with a table that contains the date and some convenience columns plus a workday indicator. (date full db date column, day, month, year, day_of_week, work_day Y/N field, isvacation, isweekday).

Populate that with values in the time range you're going to use. Set appropriate flags (e.g. update date_table set isweekday where day_of_week between 1 and 5) and update all the work_day and isvacation flags by vacation that your company observes. This is going to be a manual process, but not daunting. Some holidays are the same day each year, some are not. A vacation day could fall on the weekend, but be observed the previous Friday, etc.

Then you can calculate work hours pretty easily (e.g. select count(1)*8 from date_table where month = 1 and year = 2014 and work_day = 'Y'

Once the table is created, you can use it for lots of stuff. Add a quarter field, first work day of the month field, day names, month names, etc.. Pretty handy, especially for reporting, just join on the truncated date column in your source data to the date field in the table.

Postgresql – Group By Month With Overlapping years

date_trunc truncates the date to the specified part. You need extract

select extract('month' from dateofwork) from randomdate group by 1 order by 1;

+-------------+
|   date_part |
|-------------|
|           1 |
|           2 |
+-------------+
SELECT 2
Time: 0.005s

Best Answer

Related Solutions

Custom calendar for calculating workdays per month

Postgresql – Group By Month With Overlapping years

Related Question