I am trying to build a query that results in a JSON object generated by SQL Server. I've found that I can use subqueries to populate a field (in this case the questions field) with a JSON string that that contains a list of questions.
Below is the query:
SELECT
quizzes.id AS 'id',
quizzes.name AS 'name',
quizzes.description AS 'description',
quizzes.instructions AS 'instructions',
author.id AS 'author.id',
author.midas AS 'author.midas',
author.first_name AS 'author.first_name',
author.last_name AS 'author.last_name',
author.email AS 'author.email',
author.tel AS 'author.tel',
author.department_name AS 'author.department_name',
author.created_at AS 'author.created_at',
author.last_updated AS 'author.last_updated',
course.id AS 'course.id',
course.name AS 'course.name',
course.description AS 'course.description',
course.crn AS 'course.crn',
instructor.id AS 'course.instructor.id',
instructor.midas AS 'course.instructor.midas',
instructor.first_name AS 'course.instructor.first_name',
instructor.last_name AS 'course.instructor.last_name',
instructor.email AS 'course.instructor.email',
instructor.tel AS 'course.instructor.tel',
instructor.department_name AS 'course.instructor.department_name',
instructor.created_at AS 'course.instructor.created_at',
instructor.last_updated AS 'course.instructor.last_updated',
course.created_at AS 'course.created_at',
course.last_updated AS 'course.last_updated',
(
SELECT
questions.id AS 'id',
questions.text AS 'text',
question_types.id AS 'type.id',
question_types.name AS 'type.name',
question_types.created_at AS 'type.created_at',
question_types.description AS 'type.description',
question_author.id AS 'author.id',
question_author.midas AS 'author.midas',
question_author.first_name AS 'author.first_name',
question_author.last_name AS 'author.last_name',
question_author.email AS 'author.email',
question_author.tel AS 'author.tel',
question_author.department_name AS 'author.department_name',
question_author.created_at AS 'author.created_at',
question_author.last_updated AS 'author.last_updated',
questions.is_graded AS 'is_graded',
questions.score_value AS 'score_value',
questions.created_at AS 'created_at',
questions.last_updated AS 'last_updated'
FROM
questions
LEFT JOIN users AS question_author ON question_author.id = questions.author
LEFT JOIN question_types ON question_types.id = questions.type
WHERE
questions.quiz = quizzes.id FOR JSON PATH, INCLUDE_NULL_VALUES
) AS 'questions',
quizzes.created_at AS 'created_at',
quizzes.last_updated AS 'last_updated'
FROM
quizzes
LEFT JOIN users AS author ON quizzes.author = author.id
LEFT JOIN courses AS course ON quizzes.course = course.id
LEFT JOIN users AS instructor ON course.instructor = instructor.id FOR JSON PATH,
INCLUDE_NULL_VALUES;
The problem is: when I execute this query it responds with two rows where the resulting JSON string is split in two. Obviously this is undesirable.
After investigation, I found that if I remove the LEFT JOIN
's, then the query responds as it should (only one row with the entire string intact).
Below is an example of the following behavior:
SELECT n = sc1.name FROM sys.syscolumns sc1 FOR JSON AUTO
As can be seen above, 11 rows are being returned.
And the length of the JSON output is about 20,000 characters.
I am using the following version of SQL Server
Microsoft SQL Server 2019 - 15.0.4073.23 (X64)
Developer Edition (64-bit) on Linux (Ubuntu 18.04.5 LTS) <X64>
Why is this happening? How can I fix it?
Best Answer
After further research, I have discovered from this StackOverflow post that SQL Server breaks
FOR JSON
queries into "~2kb chunks".This means that only ~2000 characters can be sent per chunk.
UPDATE:
With the help of Max Vernon and Andriy M we have found a fairly simple solution to this problem.
The chats that lead up to this can be found here.
The above query returns two columns.
Ideally, you would replace the query between the two parentheses with your own query.
Why does this work?
According to Microsoft's Documentation
(While it is true that the above refers to XML, the same is true for JSON as well.)
The reason that JSON and XML responses are returned in chunks, to begin with, are for performance reasons:
The above solution circumvents this by setting the result of
FOR JSON
to a variable first, then sending the value of the variable, which results in SQL Server returning the response to the client as one row.It should be noted that some database clients (specifically SQL Server Management Studio) are capable of "rebuilding" the chunked response, but if you are working with PHP (PDO) or operating on a Free Trial client (such as TablePlus for Mac) you will see the raw, chunked response.
Performance
As far as performance goes, I have not done any extensive testing, but I can provide the following data from the limited testing I have done:
Using a MacBook Pro running the latest OS, I have found that on average the following query
Will process in about 23,300μs
While the query
Spent on average 137.8μs processing.
This seems to directly conflict with what the documentation says, so I am unsure how trustworthy these results are. However, it may be worth testing this yourself.