Postgresql – COUNT beginning of, mid of & end per month for many years

datepostgresql

In my last post PostgreSQL: How to get a weekly average over a year including 'gap' weeks per animal from years of data? with the fiddle http://sqlfiddle.com/#!17/0bff3/1
I got a weekly count. But it was difficult to graph.

I decided to go a different route. doing the month is 3 parts (beginning, middle and end)
My new objective: (and I won't graph this)

|animal|start             |peak      |end        |
|dog   |beginning of may  |mid of may|end of june|
|cat   |beginning of march|end of may|mid of june|

I was thinking of using a case like structure to assign a part of the month… something like the following…

SELECT animal, count(*), 
min(cast(concat(date_part('month',from_date),'.',date_part('day',from_date)) as numeric)) as start, 
max(cast(concat(date_part('month',from_date),'.',date_part('day',from_date)) as numeric)) as end
    FROM brachyopa_cleaned bc
    AND concat(from_month,'/',from_day) not in ('12/31','01/01','1/1') -- dates where month and day are unknown
GROUP BY animal
ORDER BY aminal

… and then find per species per month section the amount of occurrences to find the month section (there are in fact 12×3 'mm-dd' sections in a year) with the highest count.

Can someone give me some pointers how to do this?

[EDIT1: Changed the SELECT in what I am using so far, now I need somehow to find the peak, the 10 days period where there is the max count]
[EDIT2: rewrote the …per specie per month… section]

Best Answer

One thing I can think of, is to generate a date range from the input date that splits the month into three parts. It's quite easy to keep the first two parts constant and have the third one vary:

create function get_first(p_input date)
  returns daterange
as
$$
  select daterange(date_trunc('month', p_input)::date, date_trunc('month', p_input)::date + 10, '[]')
$$
language sql
immutable;

create function get_second(p_input date)
  returns daterange
as
$$
  select daterange(date_trunc('month', p_input)::date + 11, date_trunc('month', p_input)::date + 20, '[]')
$$
language sql
immutable;

create function get_third(p_input date)
  returns daterange
as
$$
  select daterange(date_trunc('month', p_input)::date + 21, (date_trunc('month', p_input) + interval '1 month')::date, '[)')
$$
language sql
immutable;

With that you can do the following aggregation:

select animal,
       count(*) filter (where get_first(from_date) @> from_date) as start,
       count(*) filter (where get_second(from_date) @> from_date) as mid,
       count(*) filter (where get_third(from_date) @> from_date) as "end"
from occurrence
group by animal;

The three functions can be enhanced/changed to deal with the variable length of a month, but that would require much more complex calculations. It would boil down to making the +1 or +20 a percentage of the number of days in the month.

If you need to find the highest of those three parts, you can unpivot the values and then using distinct on() to get it:

select distinct on (x.animal) x.*
from (
  select animal,
         to_char(from_date, 'yyyy-mm') as month,
         count(*) filter (where get_first(from_date) @> from_date) as start,
         count(*) filter (where get_second(from_date) @> from_date) as mid,
         count(*) filter (where get_third(from_date) @> from_date) as "end"
  from occurrence
  group by animal, month
) t
  cross join lateral (
    values (animal, start, month, 'start'), (animal, mid, month, 'mid'), (animal, "end", month, 'end')
  ) x (animal, num, month, section) --on true 
order by x.animal, num desc

The cross join lateral essentially turns the three columns into rows and then distinct on () is used to pick the highest value of the three for each animal

Online example