What I still don't understand, is why this is slower.
That sorting the rows will cost something is obvious. But why so much?
Without ORDER BY rank0...
Postgres can just
- pick the first 5 rows it finds and stop fetching rows as soon as it has 5.
Bitmap Heap Scan on entities ... rows=5 ...
- then compute
ts_rank()
for just 5 rows.
In the second case, Postgres has to
- fetch all (1495 according to your query plan) rows that qualify.
Bitmap Heap Scan on entities ... rows=1495 ...
- compute
ts_rank()
for all of them.
- sort all of them to find the first 5 according to the calculated value.
Try
ORDER BY name
just to see the cost of computing
to_tsquery('english', 'hockey'::text))
for the superfluous rows and how much remains for fetching more rows and sorting.
- You don't. You can't create an index that references another table.
- To set the default see
default-text-search-config
Workaround
You can however use functions in an index. And functions can references external tables. That said, using this is kind of hack because changing the table (common.lang
) will require reindexing and clearing the session cache.
CREATE FUNCTION common.mylookup(id int)
RETURNS regconfig AS $$
SELECT name::regconfig
FROM common.lang
WHERE id = id
$$
LANGUAGE sql
IMMUTABLE;
CREATE INDEX
ON source.user
USING GIN (to_tsvector(common.mylookup(profile_lang_id), name || ' ' || screen_name ));
You can mark functions as IMMUTABLE
which makes this permissible. If the underlying table mutates you'll have to REINDEX
the index. In projects that use this hack, like PostGIS, they recreate indexes on point releases.
Followup
@EvanCarroll, I'm trying to create the tsvector on name || ' ' || screen_name. – Brooks 4 mins ago
Full text search isn't there to do what you think it does. It's not there to search multiple fields. It's there to vectorize word content and make use of dictionaries, stubbing, lexers, gazetteers, stop-word elimination, and a slew of other tricks none of which apply. If this doesn't make sense to you, you'll have to read the docs. If what you want is grep then FTS is only seldom what you want. If you want to grep over small chunks of non-standard text (like names) it's not what you want. What you likely want trigram indexing.
If all you want is a %term%
on two fields, you're better off just doing that with a trigram index.
CREATE EXTENSION pg_trgm;
CREATE INDEX ON source.user USING GIN ((name || ' ' || screen_name) gin_trgm_ops);
WHERE name || ' ' || screen_name like '%$1%';
Or even better,
CREATE INDEX ON source.user USING GIN (name gin_trgm_ops, screen_name gin_trgm_ops);
WHERE name LIKE '%$1%' OR screen_name LIKE '%$1%';
Best Answer
That's because full-text search treats hyphenated words specially:
The numbers behind the lexemes mark the position they had in the original text (
cream
is the third word, and so on). That is used for phrase search.You see that the original hyphenated word is at the second position, and the parts are represented as following the hyphenated word.
So
ice cream
is not the same asice-cream
for PostgreSQL full text search. In the first case,ice
immediately followsidream
, but not in the second case. That is why your query returnsFALSE
.Look at what the parser does:
Perhaps the solution you are looking for would be to ignore hyphenated words and just keep their parts: