SQL Server – Why JSON Response is Split into Multiple Rows

jsonquerysql serversubquery

I am trying to build a query that results in a JSON object generated by SQL Server. I've found that I can use subqueries to populate a field (in this case the questions field) with a JSON string that that contains a list of questions.

Below is the query:

SELECT
    quizzes.id AS 'id',
    quizzes.name AS 'name',
    quizzes.description AS 'description',
    quizzes.instructions AS 'instructions',
    author.id AS 'author.id',
    author.midas AS 'author.midas',
    author.first_name AS 'author.first_name',
    author.last_name AS 'author.last_name',
    author.email AS 'author.email',
    author.tel AS 'author.tel',
    author.department_name AS 'author.department_name',
    author.created_at AS 'author.created_at',
    author.last_updated AS 'author.last_updated',
    course.id AS 'course.id',
    course.name AS 'course.name',
    course.description AS 'course.description',
    course.crn AS 'course.crn',
    instructor.id AS 'course.instructor.id',
    instructor.midas AS 'course.instructor.midas',
    instructor.first_name AS 'course.instructor.first_name',
    instructor.last_name AS 'course.instructor.last_name',
    instructor.email AS 'course.instructor.email',
    instructor.tel AS 'course.instructor.tel',
    instructor.department_name AS 'course.instructor.department_name',
    instructor.created_at AS 'course.instructor.created_at',
    instructor.last_updated AS 'course.instructor.last_updated',
    course.created_at AS 'course.created_at',
    course.last_updated AS 'course.last_updated',
    
    
    (
        SELECT
            questions.id AS 'id',
            questions.text AS 'text',
            question_types.id AS 'type.id',
            question_types.name AS 'type.name',
            question_types.created_at AS 'type.created_at',
            question_types.description AS 'type.description',
            question_author.id AS 'author.id',
            question_author.midas AS 'author.midas',
            question_author.first_name AS 'author.first_name',
            question_author.last_name AS 'author.last_name',
            question_author.email AS 'author.email',
            question_author.tel AS 'author.tel',
            question_author.department_name AS 'author.department_name',
            question_author.created_at AS 'author.created_at',
            question_author.last_updated AS 'author.last_updated',
            questions.is_graded AS 'is_graded',
            questions.score_value AS 'score_value',
            questions.created_at AS 'created_at',
            questions.last_updated AS 'last_updated'
        FROM
            questions
        LEFT JOIN users AS question_author ON question_author.id = questions.author
        LEFT JOIN question_types ON question_types.id = questions.type
        WHERE
            questions.quiz = quizzes.id FOR JSON PATH, INCLUDE_NULL_VALUES
    ) AS 'questions',
    
    
    quizzes.created_at AS 'created_at',
    quizzes.last_updated AS 'last_updated'
FROM
    quizzes
    LEFT JOIN users AS author ON quizzes.author = author.id
    LEFT JOIN courses AS course ON quizzes.course = course.id
    LEFT JOIN users AS instructor ON course.instructor = instructor.id FOR JSON PATH,
    INCLUDE_NULL_VALUES;

The problem is: when I execute this query it responds with two rows where the resulting JSON string is split in two. Obviously this is undesirable.

After investigation, I found that if I remove the LEFT JOIN's, then the query responds as it should (only one row with the entire string intact).


Below is an example of the following behavior:

SELECT n = sc1.name FROM sys.syscolumns sc1 FOR JSON AUTO

As can be seen above, 11 rows are being returned.

Result

And the length of the JSON output is about 20,000 characters.

JSON Length


I am using the following version of SQL Server

Microsoft SQL Server 2019 - 15.0.4073.23 (X64) 
Developer Edition (64-bit) on Linux (Ubuntu 18.04.5 LTS) <X64>

Why is this happening? How can I fix it?

Best Answer

After further research, I have discovered from this StackOverflow post that SQL Server breaks FOR JSON queries into "~2kb chunks".

Sql Server splits result of FOR JSON query into ~2KB chunk, so you should either concatenate fragments like on the MSDN page or you can stream results into some output stream.

This means that only ~2000 characters can be sent per chunk.


UPDATE:

With the help of Max Vernon and Andriy M we have found a fairly simple solution to this problem.

DECLARE @json nvarchar(max);

;WITH src (n) AS
(
    SELECT n = sc1.name
    FROM sys.syscolumns sc1
    FOR JSON AUTO
)
SELECT @json = src.n
FROM src

SELECT @json, LEN(@json);

The chats that lead up to this can be found here.

The above query returns two columns.

  1. The fully assembled JSON string
  2. The length of that string

Ideally, you would replace the query between the two parentheses with your own query.

Why does this work?

According to Microsoft's Documentation

SQL Server uses a predefined column name for this rowset with one column of type NTEXT - “XML_F52E2B61-18A1-11d1-B105-00805F49916B” – to indicate chunked XML rowset in UTF-16 encoding. This requires special handling of the XML chunk rowset by the APIs to expose it as a single XML instance on the client side. In ADO.Net, one needs to use ExecuteXmlReader, and in ADO/OLEDB one should use the ICommandStream interface.

(While it is true that the above refers to XML, the same is true for JSON as well.)

The reason that JSON and XML responses are returned in chunks, to begin with, are for performance reasons:

For maximum XML [JSON] publishing performance FOR XML [JSON] does steaming XML formatting of the resulting rowset and directly sends its output to the server side TDS code in small chunks without buffering whole XML in the server space. The chunk size is 2033 UCS-2 characters. Thus, XML larger than 2033 UCS-2 characters is sent to the client side in multiple rows each containing a chunk of the XML.

The above solution circumvents this by setting the result of FOR JSON to a variable first, then sending the value of the variable, which results in SQL Server returning the response to the client as one row.

It should be noted that some database clients (specifically SQL Server Management Studio) are capable of "rebuilding" the chunked response, but if you are working with PHP (PDO) or operating on a Free Trial client (such as TablePlus for Mac) you will see the raw, chunked response.


Performance

As far as performance goes, I have not done any extensive testing, but I can provide the following data from the limited testing I have done:

Using a MacBook Pro running the latest OS, I have found that on average the following query

SELECT n = sc1.name
    FROM sys.syscolumns sc1
    FOR JSON AUTO

Will process in about 23,300μs

While the query

DECLARE
    @json nvarchar (max);

;WITH src (n) AS
(
    SELECT n = sc1.name
    FROM sys.syscolumns sc1
    FOR JSON AUTO
)
SELECT @json = src.n
FROM src

SELECT @json, LEN(@json);

Spent on average 137.8μs processing.

This seems to directly conflict with what the documentation says, so I am unsure how trustworthy these results are. However, it may be worth testing this yourself.

Test data results