Postgresql – Efficient way to compute unique values using a PostgreSQL GIN index

jsonpostgresql

Say I have a JSONB column, with a GIN index on it.

I can use jsonb_object_keys() to fetch the keys for each row. But is there an efficient way of getting the unique set of keys, across all rows?

Surely the GIN index must have this information already, right?

NB: I know the "classic" solution: split the single column into a many-to-many relationship.

Best Answer

This can't be done. Of the two built-in GIN operators for indexing JSONB, one of them only stores hashed values, so you wouldn't be able to reverse them, and other one stores flattened keys, irrespective of what level of the JSONB they were at. That second one also hashes the values if they exceed a certain length.

Both of the methods can lead to false positives, so both rely on "rechecking" the full JSONB in the tabe to make sure they meet the criteria.

For example, with the "jsonb_ops" operator, the value {"ABCD": {"EFGH": "IJKL"}} will initially match the query @> '{"EFGH":{}}', because the value does have "EFGH" as a key. It would then fail the recheck, because that key occurs at the wrong depth.