I have a table payment
with a jsonb
column application
. It contains payments and the total amount for one or more tours. For example, a parent can pay 1000 $ for all his family, e.g. 4 children and his wife. Then one row in payment
contains 6 records in the jsonb
field application
, describing the tour and the customer.
[
{
"tour_id":
{
"id":"d61802ff-3eec-4a72-97ca-832f51b96bf0","name":"Paris 2018"
},
"amount":"500.00",
"customer_id":
{
"id":"f492b26d-d260-4bcb-8e96-6532fdf38665",
"name":"OBAMA John"
},
"id":"5cd8338c-15c2-48c5-b305-9e172bb33a9a"
},
{
"tour_id":
{
"id":"a52d38d4-9bfb-4ffa-1122-e536f04b0c60","name":"London 2018"
},
"amount":"500.00",
"customer_id":
{
"id":"f492b26d-d260-4bcb-8e96-6532fdf38665",
"name":"OBAMA John"
},
"id":"5cd8338c-15c2-48c5-b305-9e172bb33a9a"
},
]
My id is unique in all the postgres database, I'm using UUID v4. Which is the best way to index this id? Really the name
is redundant here, because I have a table for customers and tours, and I could just record the id:
[
{
"tour_id":"d61802ff-3eec-4a72-97ca-832f51b96bf0",
"amount":"500.00",
"customer_id":"f492b26d-d260-4bcb-8e96-6532fdf38665",
"id":"5cd8338c-15c2-48c5-b305-9e172bb33a9a"
},
(...more elements)
But I'd like to get advise on the best indexing strategy (or alternative solutions).
Best Answer
A jsonb GIN index with the
jsonb_path_ops
operator class would be an option.But to just index (multiple)
application #>> '{*, tour_id, id}')
values perjsonb
field (pseudocode, the*
is illegal), I would write a tiny,IMMUTABLE
SQL function to extract and cast the UUID fields to an UUID array (uuid[]
) - and create an expression GIN index on that. Should be considerably smaller and hence faster:What was missing in my original answer: there is no GIN operator class for
uuid[]
, yet (up to Postgres 10). See this thread on pgsql-hackers.Update: Starting with Postgres 11, the additional module btree_gin provides the missing operator class.
You have to create it yourself (once per database):
Note (quoting the manual):
See:
Then create the index:
Match the expression and use an indexable array operator in your queries to be applicable, like:
As you can see, you can search for any number of UUIDs at once with this query.
Or, if you can't (or won't) create the operator class, you can work with the
text
representation of the UUID values - which is considerably bigger and hence more expensive.Adapt
f_extract_tour_ids(jsonb)
like this:And the query:
Be sure to match the UUID format exactly.
Related: