Mysql – How to get a JSON array without JSON_ARRAYAGG in MySQL

arrayjsonMySQLmysql-5.7

I can't use a JSON_ARRAYAGG function introduced in MySQL 5.7.22 because I'm using MySQL 5.7.20. I would like to get a JSON array of a column values like I would get if I could use the function: SELECT JSON_ARRAYAGG(column_name) FROM table_name;

I got to this piece of SQL: SELECT JSON_ARRAY(GROUP_CONCAT(column_name)) FROM table_name; which works almost like JSON_ARRAYAGG. The problem is when there is no rows it returns [null] instead of null.

What is the most efficient way of replacing a JSON_ARRAYAGG function with some equivalent? I would like to avoid something like this:

SELECT IF(
  (SELECT GROUP_CONCAT(column_name) FROM table_name) IS NOT NULL,
  (SELECT JSON_ARRAY(GROUP_CONCAT(column_name)) FROM table_name),
  NULL
);

Best Answer

Finally I've found a solution. I create a string array using CONCAT and GROUP_CONCAT functions and then cast it to JSON.

SELECT JSON_REPLACE(
    data,
    '$.numbers',
    IF(
      (
        SELECT COUNT(number) FROM numbers
        WHERE JSON_CONTAINS(data, CAST(numbers.id AS CHAR), '$.numbers') = 1
      ) > 0,
      (
        SELECT CAST(CONCAT('[', GROUP_CONCAT(CONCAT('"', number, '"')), ']') AS JSON) FROM numbers
        WHERE JSON_CONTAINS(data, CAST(numbers.id AS CHAR), '$.numbers') = 1
      ),
      NULL
    )
) AS json FROM foo;

Solution for huge table

For a huge table like you mention in the comments (1 billion rows) I would consider a separate ranges table, optimized for size and a BRIN index to go along with it.

Assuming:

Read only (or mostly) data.
A maximum of 6 fractional digits (scale) and a maximum of 18 digits total (precision). Scaled by 1000000, this fits into a bigint without loss, which is considerably cheaper to store. See below.
Postgres 9.5 or later.

The operator class for range types shipped with Postgres 9.5 is range_inclusion_ops, which supports the overlaps operator &&.

To optimize disk space some more I would just save two bigint numbers (your numeric values multiplied by 1000000) and make it a functional BRIN index. Basically like this:

CREATE TABLE sensors (
  sensor_id serial PRIMARY KEY
, sensor text NOT NULL);

CREATE TABLE ranges (
  sensor_id int NOT NULL REFERENCES sensors
, range_low bigint NOT NULL
, range_hi  bigint NOT NULL
);

INSERT INTO sensors (sensor) VALUES ('sensor1');

INSERT INTO ranges (sensor_id, range_low, range_hi) VALUES
   (1, 872985609.0 * 1000000, 873017999.0 * 1000000)  -- scaled
 , (1, 872929250.000000 * 1000000, 872985609.000000 * 1000000);

CREATE INDEX ranges_brin_idx ON ranges USING BRIN (int8range(range_low, range_hi, '[]'));

Query to get the same result as before:

SELECT s.sensor, r.ranges
FROM  (
   SELECT sensor_id
        , array_agg(numrange(range_low * .000001, range_hi * .000001, '[]')) AS ranges
   FROM   ranges
   WHERE  int8range(range_low, range_hi, '[]')
       && '[873021700000000,873021800000000]'::int8range  -- scaled as well
   GROUP  BY sensor_id
   ) r
JOIN   sensors s  USING (sensor_id);

Storage size of `bigint` vs. `numrange`

A numrange with numbers of 15 precision occupies 32 bytes on disk, resulting in 64 bytes per row (plus int column, tuple header and item identifier).

While the same with two bigint columns (2 x 8 bytes) results in 52 bytes total. Makes the table around 12 GB smaller. Index size is the same.

See for yourself:

SELECT pg_column_size((1::bigint, '[873021700.123456,873021800.123456]'::numrange))
     , pg_column_size((1::bigint, 873021700123456::bigint, 873021700123456::bigint));

Detailed explanation for row size:

Configuring PostgreSQL for read performance

Mysql – Working with JSON data MySQL

Do not have 4100 identical tables. There are so many issues with such, and it is repeatedly panned in this and other forums.

4100 "stations", each with, say 10 readings, per hour? 41000 is about 12/second (average), which is not very much if you had one row per datapoint per station. One table with 3 columns: hour (or datetime), station_id, reading. And batch the 10 rows in a single INSERT statement for a little more efficiency.

How you can fetch the data in a variety of ways -- "just temperatures at 6am on weekends" becomes a simple, and somewhat efficient query.

Alternatively, have one table with 12 columns: hour/datetime, station_id, and 10 specific readings. This is likely to be more efficient in all ways. This also has the advantage that the readings can have specific datatypes for each measurement. Temperature or humidity: TINYINT SIGNED is 1 byte and allows -128..127 or DECIMAL(5.2) (3 bytes) for +/-999.99 instead of a generic FLOAT (4 bytes).

The alternatives probably take a similar amount of disk space. This database will be rather big after a year; you should probably estimate the future size and think ahead about indexing, and other issues.

Best Answer

Related Solutions

PostgreSQL – Selecting Overlapping Elements from Array of Ranges

Solution for huge table

Storage size of bigint vs. numrange

Mysql – Working with JSON data MySQL

Related Question

Storage size of `bigint` vs. `numrange`