My guess is that this would fix your query:
SELECT *
FROM location
WHERE to_tsvector('simple',unaccent2(city))
@@ to_tsquery('simple',unaccent2('wroclaw'))
ORDER BY to_tsvector('simple',unaccent2(city))
@@ to_tsquery('simple',unaccent2('wroclaw')) DESC
,displaycount
LIMIT 20;
I repeat the WHERE
condition as first element of the ORDER BY
clause - which is logically redundant, but should keep the query planner from assuming it would be better off processing rows according to the index location_displaycount_index
- which turns out to be much more expensive.
The underlying problem is that the query planner obviously grossly misjudges the selectivity and / or cost of your WHERE
condition. I can only speculate why that is.
Do you have autovacuum running - which should also take care of running ANALYZE
on your tables? Thereby, are your table-statistics up-to-date? Any effect if you run:
ANALYZE location;
And try again?
It can also be that the selectivity of the @@
operator is being misjudged. I would imagine that it is very hard to estimate for logical reasons.
If my query should not solve the problem, and generally to verify the underlying theory do one of these two things:
temporarily delete the index location_displaycount_index
temporarily disable basic index scans by running:
SET enable_indexscan = OFF;
The latter is less intrusive and only affects the current session. It leaves the methods bitmap heap scan
and bitmap index scan
open, which are used by the faster plan.
Then re-run the query.
BTW: If the theory is sound, your query (as you have it now) will be much faster with a less selective search term in the FTS condition - contrary to what you might expect. Try it.
Given the following setup:
regress=> CREATE SCHEMA A;
CREATE SCHEMA
regress=> CREATE SCHEMA B;
CREATE SCHEMA
regress=> SET search_path = B, public;
SET
regress=> CREATE TABLE bar(email text);
CREATE TABLE
CREATE UNIQUE INDEX index_bar_on_email ON bar USING btree (email);
CREATE INDEX
I cannot reproduce the problem you report in PostgreSQL 9.2:
regress=> SET search_path = A, B;
SET
regress=> CREATE TABLE bar(email text);
CREATE TABLE
regress=> CREATE UNIQUE INDEX index_bar_on_email ON bar USING btree (email);
CREATE INDEX
However, rather than using the search_path
, it's safer to use explicit schema-qualification. For example, I'd re-write the above as:
regress=> RESET search_path;
RESET
regress=> SHOW search_path ;
search_path
----------------
"$user",public
(1 row)
CREATE TABLE B.bar(email text);
CREATE UNIQUE INDEX b.index_bar_on_email ON b.bar USING btree (email);
CREATE TABLE A.bar(email text);
CREATE UNIQUE INDEX index_bar_on_email ON A.bar USING btree (email);
The indexes are automatically created in the schema of their associated table; see:
regress=> \di B.
List of relations
Schema | Name | Type | Owner | Table
--------+--------------------+-------+-------+-------
b | index_bar_on_email | index | craig | bar
(1 row)
regress=> \di A.
List of relations
Schema | Name | Type | Owner | Table
--------+--------------------+-------+-------+-------
a | index_bar_on_email | index | craig | bar
(1 row)
Update based on question change:
Yes, what you've shown does look like a Rails adapter issue. It's checking to see whether the index exists in any schema. It should be checking to see whether the first table of the given name in the search_path
has the named index.
I would write the query differently. I'd leave off the join on pg_class
entirely, instead using a cast to regclass
to handle search_path
resolution for me. I'd use the resulting oid to search for the index. Compare original, then updated, below. Note that the updated query does require search_path
to be set first.
regress=> SELECT DISTINCT i.relname, d.indisunique, d.indkey, t.oid, am.amname
FROM pg_class t, pg_class i, pg_index d, pg_am am
WHERE i.relkind = 'i'
AND d.indexrelid = i.oid
AND d.indisprimary = 'f'
AND t.oid = d.indrelid
AND t.relname = 'bar'
AND i.relnamespace IN (SELECT oid FROM pg_namespace WHERE nspname IN ('b','a') )
AND i.relam = am.oid
ORDER BY i.relname
;
relname | indisunique | indkey | oid | amname
--------------------+-------------+--------+-------+--------
index_bar_on_email | t | 1 | 28585 | btree
index_bar_on_email | t | 1 | 28592 | btree
(2 rows)
regress=> SELECT DISTINCT i.relname, d.indisunique, d.indkey, 'bar'::regclass::oid, am.amname
FROM pg_class i, pg_index d, pg_am am
WHERE i.relkind = 'i'
AND d.indexrelid = i.oid
AND d.indisprimary = 'f'
AND 'bar'::regclass = d.indrelid
AND i.relam = am.oid
ORDER BY i.relname
;
relname | indisunique | indkey | regclass | amname
--------------------+-------------+--------+----------+--------
index_bar_on_email | t | 1 | 28585 | btree
(1 row)
Best Answer
If there are multiple rows with the lowest value for
index
, then no, there is no guarantee you will always get the same row. If you only specify ordering for a column that contains duplicates, then the database does not have to provide anything more than any row that meets the criteria.Same would be true if you were to say
ORDER BY index DESC
- which row with 5 would you expect? Why?In order to force determinism / predictability, add a tie-breaker on a column that is unique (or at least unique within the same values of
index
). With your sample data, you could say:But you haven’t shared enough about your table structure and constraints to get a more specific suggestion than that.