Based on this related question on SO.
This works fine if you only want simple matches. Suppose tracks
have both Artist
and Title
field in the JSON data. So we have something like:
INSERT INTO tracks (id, data)
VALUES (1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]');
We create the index like in the original question:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING GIN (json2arr(data, 'artist'));
CREATE INDEX tracks_title_gin_idx ON tracks
USING GIN (json2arr(data, 'title'));
So now we have two fields to match. As you can see, if we perform the original query (with very naive modifications) of:
SELECT *
FROM tracks
WHERE '{"ARTIST NAME"}'::text[] <@ (json2arr(data, 'artist'))
AND '{"TITLE"}'::text[] <@ (json2arr(data, 'title'))
This will give the wrong answer because the indices of the array of artist and title in the JSON array do not have to match for this query to match something in the JSON. What is the proper way of doing this query so we can get the exact match we need? Does json2arr()
need to be changed?
Edit: Why this is wrong
Suppose our table has records like:
INSERT INTO tracks (id, data) VALUES
(1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]')
, (2, '[{"artist": "Another Artist", "title": "Welcome to My Life"},
{"artist": "Simple Plan", "title": "Perfect"}]');
If you query like:
SELECT *
FROM tracks
WHERE '{"Simple Plan"}'::text[] <@ (json2arr(data, 'artist'))
AND '{"Welcome to my Life"}'::text[] <@ (json2arr(data, 'title'))
Both records will be matched (both record 1 and 2), even though you really only wanted the first record.
Best Answer
Why?
The expressed requirement that
artist
andtitle
must match in the same element of the JSON array is not reflected in your query, which finds all rows where at least one element matches theartist
and another (possibly a different one) matches thetitle
.The example data for your first case was inconclusive, since the query cannot fail this way for a single array element. Your second example demonstrates the case well, though.
Solution in Postgres 9.3
There are various ways to fix this. One way would be to translate each json array element to an SQL array of composite type consisting of
artist
andtitle
and match the whole type as one.Another way would be to keep indexes like you have now (or even a single composite spanning both expressions:
(json2arr(data, 'artist'), json2arr(data, 'title'))
. Your current query identifies possible matches. Unnest the json arraydata
for all identified candidates and check whether bothartist
andtitle
match on the same element. May or may not be efficient enough.Better use
jsonb
in n Postgres 9.4I am not going into detail for
json
/ pg 9.3, because the release ofjsonb
in Postgres 9.4 with advanced indexing capabilities mostly obsoleted the problem. This can be implemented much simpler and more efficient with a native GIN index on ajsonb
column. There are various options. To optimize for the presented case:Index:
Query:
Sequence of attributes in the JSON value and insignificant whitespace don't matter for
jsonb
. I added detailed information for jsonb to the referenced answer on SO: