Return All Rows Containing Specific Value in JSON Array in SQL Server

jsonsql server

I have the following table:

CREATE TABLE [dbo].[Journals](
    [Id] [nvarchar](450) NOT NULL,
    [CorrelationId] [nvarchar](450) NOT NULL,
    [Version] [int] NOT NULL,
    [Type] [nvarchar](max) NOT NULL,
    [Data] [nvarchar](max) NOT NULL,
    [Created] [datetime2](7) NOT NULL,
    [Updated] [datetime2](7) NOT NULL,
 CONSTRAINT [PK_Journals] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

ALTER TABLE [dbo].[Journals] ADD  DEFAULT (getutcdate()) FOR [Created]
GO

ALTER TABLE [dbo].[Journals] ADD  DEFAULT (getutcdate()) FOR [Updated]
GO

One row could look like this:

Id: 1833eac9-3ea7-4613-be26-af3f7589d6f0
CorrelationId: 0HLNKM700ILBN:00000001
Version: 1
Type: Notification
Data: [{"type": "Event_Received","timeStamp": "2019-06-24T12:47:54.3960942Z","data": {"version": 1,"notificationName": "My Notification","receiverName": "John Doe","receiverUserIdentifier": "123456789","receiverEmail": "john.doe@example.org","senderName": "Some Sender","senderEmail": "someone@example.org","contentSubject": "This is my subject","contentBody": "And this is my body","attachmentNames": []},"sensitive": true}, {"type": "Event_ValidationSucceded","timeStamp": "2019-06-24T12:47:54.4167034Z","data": "{}","sensitive": false}, {"type": "Event_PublishStart","timeStamp": "2019-06-24T12:47:54.4691455Z","data": "{}","sensitive": false}, {"type": "Event_PublishSucceeded","timeStamp": "2019-06-24T12:47:54.4740542Z","data": "{}","sensitive": true}]
Created: 2019-06-19 12:29:20.8100000
Updated: 2019-06-19 12:30:51.1123311

I want to fetch all the Ids from the rows that contains array entries where Sensitive = true

So I have tried something like this:

SELECT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1

This works but I get duplicate ids back (since the array contains more than one item where Sensitive = true).

How can I limit the query so that I don't receive duplicate Ids when the array contains more than one item where Sensitive = true?

Wanted output:

1833eac9-3ea7-4613-be26-af3f7589d6f0

Actual output:

1833eac9-3ea7-4613-be26-af3f7589d6f0
1833eac9-3ea7-4613-be26-af3f7589d6f0

I guess that I could add a DISTINCT to SELECT DISTINCT Id but that feels like cheating, or? 😀

Best Answer

You could opt to use TOP(1) in the CROSS APPLY

SELECT Id
FROM Journals journals
CROSS APPLY 
(
SELECT TOP(1) Sensitive
FROM OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive')
WHERE Sensitive = 1
) as jsonValues;

Or, as you mentioned DISTINCT

SELECT DISTINCT Id
FROM Journals journals
CROSS APPLY OPENJSON(journals.Data)
WITH (JournalType nvarchar(255) '$.type', Sensitive bit '$.sensitive') as jsonValues
WHERE Sensitive = 1;

Main difference between the two

When using the TOP(1) the values will be filtered and the top operator will be applied earlier in the execution plan

enter image description here

When using the DISTINCT solution, the filtering and grouping is only done after the JSON function is (cross) applied

enter image description here

Which is why I would opt for the TOP(1) + CROSS APPLY solution.


As a side note, storing unique identifiers as nvarchar() is not ideal, especially when using a huge data type like nvarchar(450). Consider using the minimal or close to the minimal amount of space needed in your column datatypes & using the uniqueidentifier datatype.