PostgreSQL – Selecting Overlapping Elements from Array of Ranges

arraypostgresqlpostgresql-9.3range-typesset-returning-functions

The table I need to search contains an array of numrange values illustrated by:

CREATE TABLE data ( sensor varchar(25), ranges numrange[] );
INSERT INTO data VALUES
  ('sensor0','{"[872985609.0,873017999.0]","[873021600.0,873035999.0]","[873039600.0,873072070.0]"}'::numrange[])
, ('sensor1','{"[872929250.000000,872985609.000000]"}'::numrange[]);

Searching for rows with elements in ranges that overlap a specified range is easy with ANY:

SELECT * FROM data WHERE '[873021700,873021800]'::numrange && ANY (ranges)

I would like a SELECT statement that returns fields from the table along with only elements from the array that overlap a specified range, aka:

SELECT sensor,[array of overlapping numranges] FROM data WHERE '[873021700,873021800]'::numrange && ANY (ranges)

resulting in:

sensor      ranges
sensor0     {"[873021600.0,873035999.0]"}

Is this possible without unnest()ing the array in a subselect? If not, what would be an efficient way to unnest() when the arrays are potentially very large?

Best Answer

This works as desired:

SELECT d.sensor, r.overlapping_ranges
FROM   data d
JOIN   LATERAL (
   SELECT array_agg(range) AS overlapping_ranges
   FROM   unnest(d.ranges) range
   WHERE  range && '[873021700,873021800]'::numrange 
   ) r ON overlapping_ranges IS NOT NULL;

About LATERAL:

For big tables, it would be much more efficient to normalize your design with a separate ranges table (one range per row) instead of the ranges array. You can use a GiST index for that:


Solution for huge table

For a huge table like you mention in the comments (1 billion rows) I would consider a separate ranges table, optimized for size and a BRIN index to go along with it.

Assuming:

  • Read only (or mostly) data.
  • A maximum of 6 fractional digits (scale) and a maximum of 18 digits total (precision). Scaled by 1000000, this fits into a bigint without loss, which is considerably cheaper to store. See below.
  • Postgres 9.5 or later.

The operator class for range types shipped with Postgres 9.5 is range_inclusion_ops, which supports the overlaps operator &&.

To optimize disk space some more I would just save two bigint numbers (your numeric values multiplied by 1000000) and make it a functional BRIN index. Basically like this:

CREATE TABLE sensors (
  sensor_id serial PRIMARY KEY
, sensor text NOT NULL);

CREATE TABLE ranges (
  sensor_id int NOT NULL REFERENCES sensors
, range_low bigint NOT NULL
, range_hi  bigint NOT NULL
);

INSERT INTO sensors (sensor) VALUES ('sensor1');

INSERT INTO ranges (sensor_id, range_low, range_hi) VALUES
   (1, 872985609.0 * 1000000, 873017999.0 * 1000000)  -- scaled
 , (1, 872929250.000000 * 1000000, 872985609.000000 * 1000000);

CREATE INDEX ranges_brin_idx ON ranges USING BRIN (int8range(range_low, range_hi, '[]'));

Query to get the same result as before:

SELECT s.sensor, r.ranges
FROM  (
   SELECT sensor_id
        , array_agg(numrange(range_low * .000001, range_hi * .000001, '[]')) AS ranges
   FROM   ranges
   WHERE  int8range(range_low, range_hi, '[]')
       && '[873021700000000,873021800000000]'::int8range  -- scaled as well
   GROUP  BY sensor_id
   ) r
JOIN   sensors s  USING (sensor_id);

Storage size of bigint vs. numrange

A numrange with numbers of 15 precision occupies 32 bytes on disk, resulting in 64 bytes per row (plus int column, tuple header and item identifier).

While the same with two bigint columns (2 x 8 bytes) results in 52 bytes total. Makes the table around 12 GB smaller. Index size is the same.

See for yourself:

SELECT pg_column_size((1::bigint, '[873021700.123456,873021800.123456]'::numrange))
     , pg_column_size((1::bigint, 873021700123456::bigint, 873021700123456::bigint));

Detailed explanation for row size: