MySQL – How Does the Spec Store JSON in a Table

jsonmariadbMySQLsql-standard

From this question, I quoted the docs,

"MariaDB implements this as a LONGTEXT rather, as the JSON data type contradicts the SQL standard, and MariaDB's benchmarks indicate that performance is at least equivalent."*

That spurred a comment

I had to google and it appears that MariaDB now supports CHECK constraints. Not sure what they mean when they say that JSON contradicts the SQL standard? – Lennart yesterday

I want to know what they meant too. So what does MariaDB mean when say "contradicts the standard."

Best Answer

SQL:2016

This is from my own read of ISO/IEC TR 19075-6:2017: SQL support for JavaScript Object Notation (JSON)

Ignoring both (a) the track record of MySQL with regard to the spec, and (b) the fact that they're now implementing a JSON type. It seems as if they're referring to this,

A <JSON value expression> may have an optional <JSON input clause>. This indicates that the values expression should be parsed as JSON. The standardized option is FORMAT JSON; implementations may also support syntax such as FORMAT AVRO or FORMAT BSON. When using the <JSON input clause>, the <value expression> may be either a character string or a binary string.

Essentially, the spec provides for JSON values to come from strings, or binary blobs. The method of making them JSON is done through a decorator which identifies it and its FORMATs. The spec explicitly mentions "implementation-defined JSON representation option" it gives BSON and Apache AVRO as examples. It would look something like this (my read of the spec)

SELECT str FORMAT JSON BSON;

From there the value construction is accessible with the "SQL/JSON data model" and the rest of the spec applies to it. However there is no binary storage mechanism, or stipulation that one must be implemented for SQL/JSON. Implementing one doesn't seem to contradict the spec. In fact it goes on to say,

Constructor functions use values of SQL types and produce JSON values (JSON objects or JSON arrays) represented in SQL character or binary string types.

So they can be represented as a binary type,

Query functions evaluate SQL/JSON path language expressions against JSON values, producing values of SQL/JSON types, which are converted to SQL types.

And SQL queries need to return "SQL types", regardless of if there is a named binary type under the hood. Most databases do that though they return character/varchar/text representations of JSON.

The spec's only example of storing JSON on a table uses strings,

CREATE TABLE employees (
emp_id INTEGER,
name CHARACTER VARYING(50),
salary DECIMAL(7,2),
dept_id INTEGER,
json_emp CHARACTER VARYING(5000) )

That essentially means the it's serialized to and parsed according to RFC7159. It's also somewhat amusing that the retrieval of json_emp is missing from the spec.

tldr: I don't think anything about implementing a binary type "contradicts" the spec, as it is explicitly permitted. I think the right word is "complements."