PostgreSQL – Using generate_series for Multiple Record Types

postgresql

I have two tables that I want to query: pest_counts and pests which look like:

CREATE TABLE pests(id,name)
AS VALUES
  (1,'Thrip'),
  (2,'Fungus Gnosts');

CREATE TABLE pest_counts(id,pest_id,date,count)
AS VALUES
  (1,1,'2015-01-01'::date,14),
  (2,2,'2015-01-02'::date,5);

I want to use postgres' generate_series to show the number of each type of pest that was found for the date series:

expected results

name         | date       | count
-------------+------------+-------
Thrip        | 2015-01-01 | 14
Thrip        | 2015-01-02 | 0
....
Fungus Gnats | 2015-01-01 | 0
Fungus Gnats | 2015-01-02 | 5
...

I know I'll need something like the following but I'm not exactly sure how to do the rest:

SELECT date FROM generate_series('2015-01-01'::date, '2015-12-31'::date, '1 day') date

Best Answer

I usually solve such problems by setting up a table for all the possible data points (here the pests and dates). This is easily achieved by a CROSS JOIN, see the WITH query below.

Then, as the finishing step, I just (outer) join the existing measurements, based on the pest ID and date - optionally giving a default for the missing values via COALESCE().

So, the whole query is:

WITH data_points AS (
    SELECT id, name, i::date
    FROM pests
    CROSS JOIN generate_series('2015-01-01'::date, '2015-01-05', '1 day') t(i)
) 
SELECT d.name, d.i, COALESCE(p.cnt, 0) 
FROM data_points AS d 
LEFT JOIN pest_counts AS p 
    ON d.id = p.pest_id 
    AND d.i = p.count_date;

Check it at work on SQLFiddle.

Note: when either the table(s) or the generated series are big, doing the CROSS JOIN inside a CTE might be a bad idea. (It has to materialize all the rows, regardless of there is data for a given day or not). In this case one should do the same in the FROM clause, as a parenthesized sub-join instead of the current reference to data_points. This way the planner has a better understanding about the rows affected and the possibilities for using indexes. I use the CTE in the example because it looks cleaner for the sake of the example.

Related Solutions

Postgresql – let PostgreSQL report the offending rows when a multi-row INSERT fails because of mismatched geometry types

The problem here is that the types aren't mismatched. PostGIS provides very few PostgreSQL types, namely:

box2d — A box composed of x min, ymin, xmax, ymax. Often used to return the 2d enclosing box of a geometry.
box3d — A box composed of x min, ymin, zmin, xmax, ymax, zmax. Often used to return the 3d extent of a geometry or collection of geometries.
geometry — Planar spatial data type.
geometry_dump — A spatial datatype with two fields - geom (holding a geometry object) and path[] (a 1-d array holding the position of the geometry within the dumped object.)
geography — Ellipsoidal spatial data type.

That said, there is clearly no check on different subtypes of geometry. From point or multipoint, a violation of subtype causes the transaction to fail.

Create an ETL script that loads into a simple of geometry, then you can select the types that are not of the subtype with ST_GeometryType, or GeometryType

CREATE EXTENSION postgis;

CREATE TABLE t (
    id integer,
    p geometry
);

INSERT INTO t
VALUES
    ( 1, ST_GeometryFromText('Point(0 0)')      ),
    ( 2, ST_GeometryFromText('Point(1 2)')      ),
    ( 3, ST_GeometryFromText('MultiPoint(2 3)') ),
    ( 4, ST_GeometryFromText('Point(5 23)')     ),
    ( 5, ST_GeometryFromText('Point(42 36)')    );

Now you can run,

SELECT id, ST_AsText(p)
FROM t
WHERE GeometryType(p) <> 'POINT';

And you'll get,

 id |    st_astext    
----+-----------------
  3 | MULTIPOINT(2 3)
(1 row)

Alternatively, you can avoid this problem by casting all types to MultiPoint with ST_Multi().

PostgreSQL – Generate Continuous Series for Multiple Categories

You can use generate_series() function:

select
    d.d::date as date, 
    t.update_date,  
    t.category_1, 
    t.category_2,
    t.value
from 
    t, 
    generate_series(start_date, end_date, interval '1 day') as d(d) ;

It works basically by taking each row from t and producing a set of rows, one for each day from start_date up to (and including) end_date.

Test at dbfiddle.uk

Best Answer

Related Solutions

Postgresql – let PostgreSQL report the offending rows when a multi-row INSERT fails because of mismatched geometry types

PostgreSQL – Generate Continuous Series for Multiple Categories

Related Question