PostgreSQL – Slow ORDER BY with LIMIT Solutions

full-text-searchindexperformancepostgresql

I have this query:

SELECT * 
FROM location 
WHERE to_tsvector('simple',unaccent2("city"))
   @@ to_tsquery('simple',unaccent2('wroclaw')) 
order by displaycount

I'm happy with it:

"Sort  (cost=3842.56..3847.12 rows=1826 width=123) (actual time=1.915..2.084 rows=1307 loops=1)"
"  Sort Key: displaycount"
"  Sort Method: quicksort  Memory: 206kB"
"  ->  Bitmap Heap Scan on location  (cost=34.40..3743.64 rows=1826 width=123) (actual time=0.788..1.208 rows=1307 loops=1)"
"        Recheck Cond: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"        ->  Bitmap Index Scan on location_lower_idx  (cost=0.00..33.95 rows=1826 width=0) (actual time=0.760..0.760 rows=1307 loops=1)"
"              Index Cond: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"Total runtime: 2.412 ms"

But when I add LIMIT, the execution takes more than 2 seconds:

SELECT * 
FROM location 
WHERE to_tsvector('simple',unaccent2("city"))
   @@ to_tsquery('simple',unaccent2('wroclaw')) 
order by displaycount 
limit 20

Explain:

"Limit  (cost=0.00..1167.59 rows=20 width=123) (actual time=2775.452..2775.643 rows=20 loops=1)"
"  ->  Index Scan using location_displaycount_index on location  (cost=0.00..106601.25 rows=1826 width=123) (actual time=2775.448..2775.637 rows=20 loops=1)"
"        Filter: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"Total runtime: 2775.693 ms"

I think that it's some issue with ORDER BY and LIMIT. How can I force PostgreSQL to use the index and do the ordering at the end?

Subquery doesn't help:

SELECT * 
FROM (
    SELECT * 
    FROM location 
    WHERE to_tsvector('simple',unaccent2("city"))
       @@ to_tsquery('simple',unaccent2('wroclaw')) 
    order by displaycount
) t 
LIMIT 20;

or:

SELECT * 
FROM (
    SELECT * 
    FROM location 
    WHERE to_tsvector('simple',unaccent2("city"))
       @@ to_tsquery('simple',unaccent2('wroclaw'))
) t 
order by displaycount 
LIMIT 20;

Best Answer

My guess is that this would fix your query:

SELECT * 
FROM   location 
WHERE     to_tsvector('simple',unaccent2(city))
       @@ to_tsquery('simple',unaccent2('wroclaw')) 
ORDER  BY to_tsvector('simple',unaccent2(city))
       @@ to_tsquery('simple',unaccent2('wroclaw')) DESC
         ,displaycount 
LIMIT  20;

I repeat the WHERE condition as first element of the ORDER BY clause - which is logically redundant, but should keep the query planner from assuming it would be better off processing rows according to the index location_displaycount_index - which turns out to be much more expensive.

The underlying problem is that the query planner obviously grossly misjudges the selectivity and / or cost of your WHERE condition. I can only speculate why that is.

Do you have autovacuum running - which should also take care of running ANALYZE on your tables? Thereby, are your table-statistics up-to-date? Any effect if you run:

ANALYZE location;

And try again?

It can also be that the selectivity of the @@ operator is being misjudged. I would imagine that it is very hard to estimate for logical reasons.


If my query should not solve the problem, and generally to verify the underlying theory do one of these two things:

  • temporarily delete the index location_displaycount_index

  • temporarily disable basic index scans by running:

    SET enable_indexscan = OFF;
    

The latter is less intrusive and only affects the current session. It leaves the methods bitmap heap scan and bitmap index scan open, which are used by the faster plan.
Then re-run the query.

BTW: If the theory is sound, your query (as you have it now) will be much faster with a less selective search term in the FTS condition - contrary to what you might expect. Try it.