Mysql – Working with JSON data MySQL

jsonMySQL

I am finding that documentation for working with MySQL JSON data is still at this point apparently at it's infancy. Either that or I'm doing it wrong.
I found several docs online, i.e. https://benjaminlistwon.com/blog/working-with-json-data-in-mysql-part-1-of-3/ and others – but neither are working for me.

I recently switched from storing the station data in separate tables per station (4,100+ tables I had to do a "dynamic" PHP query on to get the data – wow!) – to JSON for speed and performance reasons, and an added benifit of not doing 4200 table writes per hour.

To get the aggregate data it's much faster – for a single row of data I can just iterate through the JSON keys. Problem is that to pull data from 83+ rows where Key=Value, PHP says out of memory error. My guess is that means I need to match the JSON value in the query instead of in PHP.

Below: SQL dump of a JSON based data table, StationData. Weather data from various stations collected hourly.

https://drive.google.com/open?id=0B6ScTEIoK3tPbGhrZkJOZmhRUTQ

Problem I'm having is features like JSON_EXTRACT, JSON_SEARCH, and JSON_etc are just returning NULL.

For example, for Olathe Kansas, I want every data stored under {"KOJC":{"Temperature":"25", "Dewpoint":"23"}} etc etc – but nothing from any other keys. The key is KOJC.
(for giggles tried the MySQL doc example):

select JSON_KEYS('{"a": 1, "b": {"c": 30}}');
+---------------------------------------+
| JSON_KEYS('{"a": 1, "b": {"c": 30}}') |
+---------------------------------------+
| ["a", "b"]                            |
+---------------------------------------+
1 row in set (0.02 sec)

select JSON_KEYS(jsonData) from StationData;
+---------------------+
| JSON_KEYS(jsonData) |
+---------------------+
| NULL                |

etc… 85 rows blah blah

select JSON_EXTRACT(jsonData, "$.KOJC") from StationData;

+----------------------------------+
| JSON_EXTRACT(jsonData, "$.KOJC") |
+----------------------------------+
| NULL                             |
| NULL                             |

etc….
85 rows in set (0.16 sec)

select jsonData->>'$.KOJC' as Station from StationData;
+---------+
| Station |
+---------+
| NULL    |

etc….
85 rows in set (0.14 sec)

select * from StationData were JSON_SEARCH(jsonData, 'all', '$.KOJC');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'JSON_SEARCH(jsonData, 'all', '$.KOJC')' at line 1

select * from StationData where JSON_EXTRACT(jsonData, "$.KOJC");
Empty set (0.15 sec)

UPDATE – I found what's happening. JSON_EXTRACT works for a non-arrayed JSON string but when working with an array I am unable to pull out array elements.

select JSON_EXTRACT('{"KOJC":{"Tmp":"25"}}', '$.KOJC');                          +-------------------------------------------------+
| JSON_EXTRACT('{"KOJC":{"Tmp":"25"}}', '$.KOJC') |
+-------------------------------------------------+
| {"Tmp": "25"}                                   |
+-------------------------------------------------+
1 row in set (0.01 sec)

^ that works
But the below does not when presented with a array:

select JSON_EXTRACT('[{"KOJC":{"Tmp":"25"}}, {"KTWX":{"Tmp":"32"}}]', '$.KOJC');
 +--------------------------------------------------------------------------+
| JSON_EXTRACT('[{"KOJC":{"Tmp":"25"}}, {"KTWX":{"Tmp":"32"}}]', '$.KOJC') |
+--------------------------------------------------------------------------+
| NULL                                                                     |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

select JSON_EXTRACT('[{"KOJC":{"Tmp":"25"}}, {"KTWX":{"Tmp":"32"}}]', '$[KOJC][*]');
ERROR 3143 (42000): Invalid JSON path expression. The error is around character position 2.

select JSON_EXTRACT('[{"KOJC":{"Tmp":"25"}}, {"KTWX":{"Tmp":"32"}}]', '$[$.KOJC][*]');
ERROR 3143 (42000): Invalid JSON path expression. The error is around character position 2.

Best Answer

Do not have 4100 identical tables. There are so many issues with such, and it is repeatedly panned in this and other forums.

4100 "stations", each with, say 10 readings, per hour? 41000 is about 12/second (average), which is not very much if you had one row per datapoint per station. One table with 3 columns: hour (or datetime), station_id, reading. And batch the 10 rows in a single INSERT statement for a little more efficiency.

How you can fetch the data in a variety of ways -- "just temperatures at 6am on weekends" becomes a simple, and somewhat efficient query.

Alternatively, have one table with 12 columns: hour/datetime, station_id, and 10 specific readings. This is likely to be more efficient in all ways. This also has the advantage that the readings can have specific datatypes for each measurement. Temperature or humidity: TINYINT SIGNED is 1 byte and allows -128..127 or DECIMAL(5.2) (3 bytes) for +/-999.99 instead of a generic FLOAT (4 bytes).

The alternatives probably take a similar amount of disk space. This database will be rather big after a year; you should probably estimate the future size and think ahead about indexing, and other issues.