PostgreSQL – Grouping Data by Range of Years

postgresql

I've got a big table (~9M rows) and want to group the rows on a field containing the year. So far that's pretty easy:

// greatly simplified:
SELECT count(*), year FROM dataset GROUP BY year ORDER BY 2;

We defined some irregular time periods spanning multiple years:

<1945, 1946-1964, 1965-1974, 1975-1991, 1992-2005 and >2005

I've got no clue on how to group these results in the group by clause. I could make subquery's for every time period.

SELECT
  ( SELECT count(*) FROM dataset WHERE year <= 1945 AND ...... ) AS pre1945,
  ( ....) AS period2,
  ....
FROM dataset

But that feels not right and I'm wondering if it was possible to let Postgresql do it. Especially because the query is a strong simplification of the real query: it has multiple conditions, amongst them a ST_within clause spanning four tables. So choosing the subquery-approach results in a bloated query.

Is there a better way to create this result?

Best Answer

Use conditional counting:

select count(case when year <= 1945 then 1 end) as pre1945,
       count(case when year between 1946 and 1964 then 1 end) as period2,
       count(case when year between 1965 and 1974 then 1 end) as period3,
       ...
from ...
where ...;

This works because count() ignores null values and the case statement returns a null for values outside of the range it tests for (an else null is implicit).

With the upcoming 9.4 version you can re-write this as

select count(*) filter (where year <= 1945) as pre1945,
       count(*) filter (where year between 1946 and 1964) as period2,
       count(*) filter (where year between 1965 and 1974) as period3,
       ...
from ...
where ...;