Assessment
In your last query, the bitmap index scan looking for 'hat' produces 307 hits.
Postgres then runs a bitmap heap scan to filter merchants similar enough ( similarity(...) > 0.2
), producing 12 rows. Your test is with 30K rows, so your real life query will produce around 300 times as many hits, 90k / 3.5k for the test case at hand. An additional index on merchant
will help.
Advice
I suggest you create an additional trigram index for the similarity search. Be sure to read the chapter in the manual about trigram index support. We need the additional module pg_trgm
installed (like you obviously have).
For your first request:
How can I search for a query like 'WALMART BAGS' which will first
return me product BAG with merchant WALMART and then BAGS from other merchants.
I suggest this query using the similarity operator %
:
-- SELECT set_limit(0.2) -- Adjust similarity operator only if needed
SELECT *
FROM products
WHERE to_tsvector('english', product) @@ to_tsquery('bag')
AND merchant % 'walmart'
ORDER BY merchant <-> 'walmart'
-- LIMIT n; -- possibly limit to top n results
Again, you can choose between GiST and GIN, but this time GiST carries a decisive advantage:
This can be implemented quite efficiently by GiST indexes, but not by
GIN indexes. It will usually beat the first formulation when only a
small number of the closest matches is wanted.
Therefore, I suggest this index:
CREATE INDEX prod_merchant_trgm_idx ON products USING gist (merchant gist_trgm_ops);
As for your second request:
Can I have both GIN and GIST index working for me?
Yes, you can. It would hardly make sense to have both types for the same (combination of) column(s), but Postgres can combine GiST and GIN indices very well in the same query. I quote the excellent manual yet again, on Combining Multiple Indexes:
To combine multiple indexes, the system scans each needed index and
prepares a bitmap in memory giving the locations of table rows that
are reported as matching that index's conditions. The bitmaps are then
ANDed and ORed together as needed by the query. Finally, the actual
table rows are visited and returned. The table rows are visited in
physical order, because that is how the bitmap is laid out; this means
that any ordering of the original indexes is lost, and so a separate
sort step will be needed if the query has an ORDER BY
clause. For this
reason, and because each additional index scan adds extra time, the
planner will sometimes choose to use a simple index scan even though
additional indexes are available that could have been used as well.
Best Answer
Your only chance is to index all possible conditions, perhaps PostgreSQL will use bitmap operations to cover the
OR
s. Perhaps it will think that a sequential scan is cheaper and go for that.