Postgresql – Postgres full text search with multiple columns, why concat in index and not at runtime


I've come across full text search in postgres in the last few days, and I am a little confused about indexing when searching across multiple columns.

The postgres docs talk about creating a ts_vector index on concatenated columns, like so:

CREATE INDEX pgweb_idx ON pgweb 
    USING gin(to_tsvector('english', title || ' ' || body));

which I can search like so:

... WHERE 
      (to_tsvector('english', title||' '||body) @@ to_tsquery('english', 'foo'))

However, if I wanted to sometimes search just the title, sometimes just the body, and sometimes both, I would need 3 separate indexes. And if I added in a third column, that could potentially be 6 indexes, and so on.

An alternative which I haven't seen in the docs is just to index the two columns seperately, and then just use a normal WHERE...OR query:

      (to_tsvector('english', title) @@ to_tsquery('english','foo'))
      (to_tsvector('english', body) @@ to_tsquery('english','foo'))

Benchmarking the two on ~1million rows seems to have basically no difference in performance.

So my question is:

Why would I want to concatenate indexes like this, rather than just indexing columns individually? What are the advantages/disadvantages of both?

My best guess is that if I knew in advance I would only want to ever search both columns (never one at a time) I would only ever need one index by concatenating which use less memory.

Best Answer

No you don't need separate indexes. Use the weights feature. They are just a label your can query against. You can have up to four labels to query against (A-D).

--search any "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick'::tsquery; --true

--search B "field" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:B'::tsquery; --false

--search B or C "fields" for quick:
select 'quick:1A brown:2B quick:3C'::tsvector @@ 'quick:BC'::tsquery; --true

You might want to concatenate tsvectors, so that you can separately apply weights to them and then put them together:

  setweight( name_column::tsvector, 'A') || setweight( phone_column::tsvector, 'B');