This works as desired:
SELECT d.sensor, r.overlapping_ranges
FROM data d
JOIN LATERAL (
SELECT array_agg(range) AS overlapping_ranges
FROM unnest(d.ranges) range
WHERE range && '[873021700,873021800]'::numrange
) r ON overlapping_ranges IS NOT NULL;
About LATERAL
:
For big tables, it would be much more efficient to normalize your design with a separate ranges
table (one range per row) instead of the ranges array. You can use a GiST index for that:
Solution for huge table
For a huge table like you mention in the comments (1 billion rows) I would consider a separate ranges
table, optimized for size and a BRIN index to go along with it.
Assuming:
- Read only (or mostly) data.
- A maximum of 6 fractional digits (scale) and a maximum of 18 digits total (precision). Scaled by 1000000, this fits into a
bigint
without loss, which is considerably cheaper to store. See below.
- Postgres 9.5 or later.
The operator class for range types shipped with Postgres 9.5 is range_inclusion_ops
, which supports the overlaps operator &&
.
To optimize disk space some more I would just save two bigint
numbers (your numeric values multiplied by 1000000) and make it a functional BRIN index. Basically like this:
CREATE TABLE sensors (
sensor_id serial PRIMARY KEY
, sensor text NOT NULL);
CREATE TABLE ranges (
sensor_id int NOT NULL REFERENCES sensors
, range_low bigint NOT NULL
, range_hi bigint NOT NULL
);
INSERT INTO sensors (sensor) VALUES ('sensor1');
INSERT INTO ranges (sensor_id, range_low, range_hi) VALUES
(1, 872985609.0 * 1000000, 873017999.0 * 1000000) -- scaled
, (1, 872929250.000000 * 1000000, 872985609.000000 * 1000000);
CREATE INDEX ranges_brin_idx ON ranges USING BRIN (int8range(range_low, range_hi, '[]'));
Query to get the same result as before:
SELECT s.sensor, r.ranges
FROM (
SELECT sensor_id
, array_agg(numrange(range_low * .000001, range_hi * .000001, '[]')) AS ranges
FROM ranges
WHERE int8range(range_low, range_hi, '[]')
&& '[873021700000000,873021800000000]'::int8range -- scaled as well
GROUP BY sensor_id
) r
JOIN sensors s USING (sensor_id);
Storage size of bigint
vs. numrange
A numrange
with numbers of 15 precision occupies 32 bytes on disk,
resulting in 64 bytes per row (plus int column, tuple header and item identifier).
While the same with two bigint
columns (2 x 8 bytes) results in 52 bytes total. Makes the table around 12 GB smaller. Index size is the same.
See for yourself:
SELECT pg_column_size((1::bigint, '[873021700.123456,873021800.123456]'::numrange))
, pg_column_size((1::bigint, 873021700123456::bigint, 873021700123456::bigint));
Detailed explanation for row size:
Do not have 4100 identical tables. There are so many issues with such, and it is repeatedly panned in this and other forums.
4100 "stations", each with, say 10 readings, per hour? 41000 is about 12/second (average), which is not very much if you had one row per datapoint per station. One table with 3 columns: hour (or datetime), station_id, reading. And batch the 10 rows in a single INSERT
statement for a little more efficiency.
How you can fetch the data in a variety of ways -- "just temperatures at 6am on weekends" becomes a simple, and somewhat efficient query.
Alternatively, have one table with 12 columns: hour/datetime, station_id, and 10 specific readings. This is likely to be more efficient in all ways. This also has the advantage that the readings can have specific datatypes for each measurement. Temperature or humidity: TINYINT SIGNED
is 1 byte and allows -128..127 or DECIMAL(5.2)
(3 bytes) for +/-999.99 instead of a generic FLOAT
(4 bytes).
The alternatives probably take a similar amount of disk space. This database will be rather big after a year; you should probably estimate the future size and think ahead about indexing, and other issues.
Best Answer
Finally I've found a solution. I create a string array using
CONCAT
andGROUP_CONCAT
functions and then cast it toJSON
.