PostgreSQL Indexing – PostgreSQL Not Using Index

postgresql

I have a very simple table:

CREATE TABLE content
(
  id serial NOT NULL,
  text text,
  fullfilename text,
  CONSTRAINT "PK_ID" PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE content
  OWNER TO postgres;

CREATE INDEX content_idx
  ON content
  USING gin
  (to_tsvector('danish'::regconfig, text));

Where text is a rather long aggregated text to perform full text searching.

When I run the following query it resorts to a seq scan:

EXPLAIN ANALYZE SELECT id
FROM content
WHERE to_tsvector('danish', text) @@ to_tsquery('danish','felter')

"Seq Scan on content  (cost=0.00..164.57 rows=249 width=4) (actual time=41.147..7235.823 rows=289 loops=1)"
"  Filter: (to_tsvector('danish'::regconfig, text) @@ '''felt'''::tsquery)"
"  Rows Removed by Filter: 1149"
"Planning time: 0.366 ms"
"Execution time: 7235.914 ms"

That is absolutely not okay. But when disabling seq scans with SET enable_seqscan TO 'off' I get the following result:

"Bitmap Heap Scan on content  (cost=17.94..168.53 rows=249 width=4) (actual time=0.145..0.323 rows=289 loops=1)"
"  Recheck Cond: (to_tsvector('danish'::regconfig, text) @@ '''felt'''::tsquery)"
"  Heap Blocks: exact=70"
"  ->  Bitmap Index Scan on content_idx  (cost=0.00..17.87 rows=249 width=0) (actual time=0.121..0.121 rows=289 loops=1)"
"        Index Cond: (to_tsvector('danish'::regconfig, text) @@ '''felt'''::tsquery)"
"Planning time: 0.373 ms"
"Execution time: 0.383 ms"

What exactly is going on, and what parameters needs tuning for it to run better. I don't like the idea of removing a tool from the query planners toolbox. As it stands, the setup is running a stock PostgreSQL 9.4.4

Best Answer

The problem here is that it thinks to_tsvector is a very cheap operation to perform, even when performing it on an entire table. Basically, it think it is the same cost as adding together two integers. But of course it is not. In the upcoming release 9.5, the default cost for that function has been increased 100-fold (which is probably still not enough to be accurate...but it should be enough to tip the scales for your case, which is all you need).

You can change the cost of that function yourself:

alter function to_tsvector ( regconfig, text) cost 1000;

(Note that because this is a built-in function, changes you make will not survive a pg_upgrade, or a dump, initdb and restore cycle)

This is much safer than fiddling around with the global cost parameters.

The general (global) cost parameters are described here, and ones for specific functions are described here. The way the two interact with each other don't seem to be well-described anywhere in the docs.