Postgresql – Index for finding combinations of attributes in JSON array

arrayindexjsonpostgresql

This works fine if you only want simple matches. Suppose tracks have both Artist and Title field in the JSON data. So we have something like:

INSERT INTO tracks (id, data)
VALUES (1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]');

We create the index like in the original question:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING GIN (json2arr(data, 'artist'));

CREATE INDEX tracks_title_gin_idx ON tracks
USING GIN (json2arr(data, 'title'));

So now we have two fields to match. As you can see, if we perform the original query (with very naive modifications) of:

SELECT *
FROM   tracks
WHERE  '{"ARTIST NAME"}'::text[] <@ (json2arr(data, 'artist'))
AND    '{"TITLE"}'::text[]       <@ (json2arr(data, 'title'))

This will give the wrong answer because the indices of the array of artist and title in the JSON array do not have to match for this query to match something in the JSON. What is the proper way of doing this query so we can get the exact match we need? Does json2arr() need to be changed?

Edit: Why this is wrong

Suppose our table has records like:

INSERT INTO tracks (id, data)  VALUES
  (1, '[{"artist": "Simple Plan",    "title": "Welcome to My Life"}]')
, (2, '[{"artist": "Another Artist", "title": "Welcome to My Life"},
        {"artist": "Simple Plan",    "title": "Perfect"}]');

If you query like:

SELECT *
FROM   tracks
WHERE  '{"Simple Plan"}'::text[]        <@ (json2arr(data, 'artist'))
AND    '{"Welcome to my Life"}'::text[] <@ (json2arr(data, 'title'))

Both records will be matched (both record 1 and 2), even though you really only wanted the first record.

Best Answer

Why?

The expressed requirement that artist and title must match in the same element of the JSON array is not reflected in your query, which finds all rows where at least one element matches the artist and another (possibly a different one) matches the title.

The example data for your first case was inconclusive, since the query cannot fail this way for a single array element. Your second example demonstrates the case well, though.

Solution in Postgres 9.3

There are various ways to fix this. One way would be to translate each json array element to an SQL array of composite type consisting of artist and title and match the whole type as one.

Another way would be to keep indexes like you have now (or even a single composite spanning both expressions: (json2arr(data, 'artist'), json2arr(data, 'title')). Your current query identifies possible matches. Unnest the json array data for all identified candidates and check whether both artist and title match on the same element. May or may not be efficient enough.

Better use `jsonb` in n Postgres 9.4

I am not going into detail for json / pg 9.3, because the release of jsonb in Postgres 9.4 with advanced indexing capabilities mostly obsoleted the problem. This can be implemented much simpler and more efficient with a native GIN index on a jsonb column. There are various options. To optimize for the presented case:

CREATE TABLE tracks (id serial, data jsonb);

INSERT INTO tracks (id, data)  VALUES
  (1, '[{"artist": "Simple Plan",    "title": "Welcome to My Life"}]')
, (2, '[{"artist": "Another Artist", "title": "Welcome to My Life"},
        {"artist": "Simple Plan",    "title": "Perfect"}]');

Index:

CREATE INDEX tracks_data_gin_idx ON tracks USING gin (data jsonb_path_ops);

Query:

SELECT * FROM tracks
WHERE  data @> '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]';

Sequence of attributes in the JSON value and insignificant whitespace don't matter for jsonb. I added detailed information for jsonb to the referenced answer on SO:

Index for finding an element in a JSON array

Related Solutions

Postgresql – How to create an index on an integer json property in postgres

Try this instead:

CREATE INDEX user_reputation_idx ON users(cast("user"->>'reputation' AS int));

The Postgres syntax shortcut :: for casts is not allowed without additional parentheses in an index definition (see @bma's comment). It works with the standard SQL function, though: cast(expression AS type) This is not related to the json type per se.

Either way, you can still use the syntax shortcut expression::type in expressions that utilize the index.

PostgreSQL filtering on array length inside JSON

Without knowing your data, I can only guess that the selectivity of your index is low (which happen if the length of the array does not vary much).

One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a NOT NULL column (for example, the primary key of the table) to count, and then include this column in the index:

CREATE INDEX n_items ON your_table (jsonb_array_length(data->'items'), id);

SELECT count(id) 
  FROM your_table
 WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

This will hopefully turn into an index-only scan (I tested this omitting the jsonb part, but you will be able to tell if it works).