PostgreSQL – Grouping Data by Range of Years

postgresql

I've got a big table (~9M rows) and want to group the rows on a field containing the year. So far that's pretty easy:

// greatly simplified:
SELECT count(*), year FROM dataset GROUP BY year ORDER BY 2;

We defined some irregular time periods spanning multiple years:

<1945, 1946-1964, 1965-1974, 1975-1991, 1992-2005 and >2005

I've got no clue on how to group these results in the group by clause. I could make subquery's for every time period.

SELECT
  ( SELECT count(*) FROM dataset WHERE year <= 1945 AND ...... ) AS pre1945,
  ( ....) AS period2,
  ....
FROM dataset

But that feels not right and I'm wondering if it was possible to let Postgresql do it. Especially because the query is a strong simplification of the real query: it has multiple conditions, amongst them a ST_within clause spanning four tables. So choosing the subquery-approach results in a bloated query.

Is there a better way to create this result?

Best Answer

Use conditional counting:

select count(case when year <= 1945 then 1 end) as pre1945,
       count(case when year between 1946 and 1964 then 1 end) as period2,
       count(case when year between 1965 and 1974 then 1 end) as period3,
       ...
from ...
where ...;

This works because count() ignores null values and the case statement returns a null for values outside of the range it tests for (an else null is implicit).

With the upcoming 9.4 version you can re-write this as

select count(*) filter (where year <= 1945) as pre1945,
       count(*) filter (where year between 1946 and 1964) as period2,
       count(*) filter (where year between 1965 and 1974) as period3,
       ...
from ...
where ...;

Related Solutions

PostgreSQL – Multi-Valued Range Subtraction

Looks like simple math. Assuming that all the ranges are of the same inclusive-exclusive type [):

SELECT some_data, int4range(lower(a), LEAST(upper(a),lower(b))) AS ab
FROM test
WHERE lower(a) < lower(b) 

UNION ALL

SELECT some_data, int4range(GREATEST(lower(a),upper(b)), upper(a)) 
FROM test
WHERE upper(b) < upper(a) 

UNION ALL

SELECT some_data, a
FROM test
WHERE b = int4range(0,0)
   OR a = int4range(0,0) ;

Tested at SQL-Fiddle.

PostgreSQL – Creating DOMAIN on a Range of Years

A domain is a constraint. It doesn't hold any values whatsoever. It can only restrict a type to exclude values. If you need to store a range, use the range types.

As a side note, you can in fact, create a domain on a range type. In the event you want all of your ranges to start on a Wednesday

CREATE DOMAIN foo AS daterange
  CHECK ( EXTRACT(dow FROM lower(VALUE)) = 4 );

SELECT daterange('2018-01-02'::date,'2018-02-05'::date);

SELECT daterange('2018-01-02'::date,'2018-02-05'::date)::foo;
ERROR:  value for domain foo violates check constraint "foo_check"

Moreover, you can also ensure the range is always one year:

CREATE DOMAIN bar AS daterange
  CHECK (
    (lower(VALUE)::timestamp without time zone - upper(VALUE)::timestamp without time zone) = '-365 days'
  );

But if you're going to do that -- make it such that your range is always one year and a fixed size, I suggest you just store the year that you start as ::smallint and be done with it.

Best Answer

Related Solutions

PostgreSQL – Multi-Valued Range Subtraction

PostgreSQL – Creating DOMAIN on a Range of Years

Related Question