PostgreSQL – How to Define a GIN Index Across Two Columns/Arrays

postgresql

Let's say we have a Postgres table contacts, each record having a bunch of labeled email addresses (pairs of labels and emails)– one of which is the "primary".

This is stored like:

  • id primary key
  • email text
  • email_label text
  • metadata jsonb
    • emails array
    • email text
    • label text

For example, a record might look something like:

{
  id: 1,
  email: 'a@a.com',
  email_label: 'a',
  metadata: {
    emails: [
      {
        email: 'b@b.com',
        label: 'b'
      },
      {
        email: 'c@c.com',
        label: 'c'
      }
    ]
  }
}

Given this storage pattern, we want to be able to find a record by any of its email addresses.

The naive query would look like:

SELECT id
FROM contacts
WHERE
  email = 'my@email.com' OR
  metadata -> 'emails' @> '[{"email": "my@email.com"}]'

Is there any way to create an index that speeds up this operation significantly? It would need to update automatically in response to changes to the records, and ideally index across both the text column and the nested JSONB column.

The specific use case here would be able to do lookups by email address efficiently and quickly, without overhauling this structure or creating a new relational table.

I believe the solution involves using a GIN index and this question mentions jsonb_path_ops but I'm not sure how to wrangle all the pieces together.

Best Answer

Since you want to run the query on metadata->'emails', you need to build the index on that expression:

create index on contacts using gin ((metadata->'emails'));

Alternatively, you could make the index on metadata itself, and re-formulate the query condition like this:

... OR metadata @> '{"emails":[{"email": "my@email.com"}]}'

In either case, you also need an index on the column "email", to satisfy the other branch of the OR condition. Then the two indexes will be combined with a BitmapOr operation.

It is possible you could make one index on both columns using some complicated expression index, but that doesn't really seem worthwhile.