PostgreSQL – Indexing by Subkey in jsonb

indexjsonpostgresqlpostgresql-9.6

I have a table payment with a jsonb column application. It contains payments and the total amount for one or more tours. For example, a parent can pay 1000 $ for all his family, e.g. 4 children and his wife. Then one row in payment contains 6 records in the jsonb field application, describing the tour and the customer.

[
   {
    "tour_id":
      {
        "id":"d61802ff-3eec-4a72-97ca-832f51b96bf0","name":"Paris  2018"
      },
      "amount":"500.00",
      "customer_id":
          {
             "id":"f492b26d-d260-4bcb-8e96-6532fdf38665",
             "name":"OBAMA John"
          },
     "id":"5cd8338c-15c2-48c5-b305-9e172bb33a9a"
   },
   {
    "tour_id":
      {
        "id":"a52d38d4-9bfb-4ffa-1122-e536f04b0c60","name":"London  2018"
      },
      "amount":"500.00",
      "customer_id":
          {
             "id":"f492b26d-d260-4bcb-8e96-6532fdf38665",
             "name":"OBAMA John"
          },
     "id":"5cd8338c-15c2-48c5-b305-9e172bb33a9a"
   },
]

My id is unique in all the postgres database, I'm using UUID v4. Which is the best way to index this id? Really the name is redundant here, because I have a table for customers and tours, and I could just record the id:

[
   {
    "tour_id":"d61802ff-3eec-4a72-97ca-832f51b96bf0",
    "amount":"500.00",
    "customer_id":"f492b26d-d260-4bcb-8e96-6532fdf38665",
    "id":"5cd8338c-15c2-48c5-b305-9e172bb33a9a"
   },
   (...more elements)

But I'd like to get advise on the best indexing strategy (or alternative solutions).

Best Answer

A jsonb GIN index with the jsonb_path_ops operator class would be an option.

But to just index (multiple) application #>> '{*, tour_id, id}') values per jsonb field (pseudocode, the * is illegal), I would write a tiny, IMMUTABLE SQL function to extract and cast the UUID fields to an UUID array (uuid[]) - and create an expression GIN index on that. Should be considerably smaller and hence faster:

CREATE OR REPLACE FUNCTION f_extract_tour_ids(jsonb)
  RETURNS uuid[] AS
$func$
SELECT ARRAY(SELECT (jsonb_array_elements($1) #>> '{tour_id, id}')::uuid)
$func$ LANGUAGE sql IMMUTABLE;

What was missing in my original answer: there is no GIN operator class for uuid[], yet (up to Postgres 10). See this thread on pgsql-hackers.

Update: Starting with Postgres 11, the additional module btree_gin provides the missing operator class.

You have to create it yourself (once per database):

CREATE OPERATOR CLASS _uuid_ops DEFAULT 
   FOR TYPE _uuid USING gin AS 
   OPERATOR 1 &&(anyarray, anyarray), 
   OPERATOR 2 @>(anyarray, anyarray), 
   OPERATOR 3 <@(anyarray, anyarray), 
   OPERATOR 4 =(anyarray, anyarray), 
   FUNCTION 1 uuid_cmp(uuid, uuid), 
   FUNCTION 2 ginarrayextract(anyarray, internal, internal), 
   FUNCTION 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal), 
   FUNCTION 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal), 
   STORAGE uuid;

Note (quoting the manual):

The user who defines an operator class becomes its owner. Presently, the creating user must be a superuser.

See:

Then create the index:

CREATE INDEX payments_application_tour_ids_gin_idx ON payments
USING gin (f_extract_tour_ids(application));

Match the expression and use an indexable array operator in your queries to be applicable, like:

SELECT *
FROM   payment
WHERE  f_extract_tour_ids(application) @> '{d61802ff-3eec-4a72-97ca-832f51b96bf0}'::uuid[];

As you can see, you can search for any number of UUIDs at once with this query.


Or, if you can't (or won't) create the operator class, you can work with the text representation of the UUID values - which is considerably bigger and hence more expensive.

Adapt f_extract_tour_ids(jsonb) like this:

CREATE OR REPLACE FUNCTION f_extract_tour_ids(jsonb)
  RETURNS text[] AS
$func$
SELECT ARRAY(SELECT (jsonb_array_elements($1) #>> '{tour_id, id}'))
$func$ LANGUAGE sql IMMUTABLE;

And the query:

SELECT *
FROM   payment
WHERE  f_extract_tour_ids(application) @> '{d61802ff-3eec-4a72-97ca-832f51b96bf0}'::text[];

Be sure to match the UUID format exactly.

Related: