Postgresql – Can jsonb do fast search by *value* or even a partial value? Other types

database-designfull-text-searchjsonpostgresql

I have a person table where arbitrary data must be inserted. Since I'm using Postgres 9.4, jsonb appears to be the right choice.

Example data:

id: 1, name: "Joe Doe", preferences: { color: "red" , toy: "car"}
id: 2, name: "Jane Doe", preferences: { color: "blue", food: "hamburguer" }

The problem is that I need to query it by the variable values, example:

All the people who has "hamburger" on preferences, and partial queries must be possible too, so searching for "burg" must bring me record 2.

Using json functions, I can search for values, as long as I known the key, right? Or there is a fast way to search for the values?

What I'm thinking to do is to create a tsvector column and dump the json to use fulltext search to do the partial queries.

Best Answer

I suggest a MATERIALIZED VIEW with unnested values and a trigram index as search tools.

hstore

If you don't need nested values, hstore may be even better for you. Use the function svals(hstore) to unnest hstore values.

You need to install the additional module hstore once per database:

CREATE EXTENSION hstore;

Table

CREATE TABLE person AS
SELECT * FROM (
   VALUES
     (1, 'Joe Doe', hstore  'toy=>car, color=>red')
   , (2, 'Jane Doe', 'food=>hamburguer, color=>blue')
   ) t(person_id, name, preferences);

Materialized view

CREATE MATERIALIZED VIEW person_pref AS
SELECT p.person_id, j.preference        -- just pref
FROM   person p, svals(p.preferences) j(preference);

This is an implicit CROSS JOIN LATERAL to the set-returning function svals().

Trigram Index

You need to install the additional module pg_trgm once per database:

CREATE EXTENSION pg_trgm;

Then:

CREATE INDEX person_pref_j_trgm_idx ON person_pref_j
USING gin (preference gin_trgm_ops);

Details:

Query

SELECT *
FROM   person p
WHERE  EXISTS (
   SELECT 1
   FROM   person_pref pp
   WHERE  pp.person_id = p.person_id
   AND    pp.preference ILIKE '%burg%'
   );

Be aware that this is pretty fast.

jsonb

If you have nested values or numeric or boolean values, jsonb may be more efficient. You can do the same as above with jsonb_each_text(jsonb):

CREATE TABLE person AS
SELECT * FROM (
   VALUES
     (1, 'Joe Doe', jsonb  '{"toy": "car", "color": "red"}')
   , (2, 'Jane Doe', '{"food": "hamburguer", "color": "blue"}')
   ) t(person_id, name, preferences);


CREATE MATERIALIZED VIEW person_pref AS
SELECT p.person_id, j.key, j.preference   -- incl. key or just pref?
FROM  person_j p, jsonb_each_text(p.preferences) j(key, preference);

Same index, same query. You might want to add the key to the MV and search for that, too:

CREATE INDEX person_pref_trgm_idx ON person_pref_j
USING GIN (key gin_trgm_ops, preference gin_trgm_ops);