Postgresql – Attempting to avoid SORT in query planner

indexperformancepostgresqlquery-performance

I have a query that is used VERY frequently in my application. Without modifying the query itself, I'm trying to optimize it by toying with different indexing schemes. Here's the query (basically):

SELECT foo from big_table, status_types
WHERE big_table.attr=1 AND 
  lower(big_table.groupname) LIKE lower('Bar%') AND 
  big_table.type = status_types.typeid
ORDER BY big_table.groupname DESC offset 0 limit 20;

 Limit  (cost=24511.42..24511.45 rows=14 width=150)
 ->  Sort  (cost=24511.42..24511.45 rows=14 width=150)
      Sort Key: big_table.groupname
     ->  Nested Loop  (cost=0.00..24511.15 rows=14 width=150)
           ->  Seq Scan on big_table (cost=0.00..24495.23 rows=14 width=72)
                 Filter: ((big_table.attr = 1) AND (lower(groupname) ~~ 'Bar%'::text))
           ->  Index Scan using typeid on status_types (cost=0.00..1.12 rows=1 width=86)
                 Index Cond: (status_types.typeid = big_table.type)

I have indicies on big_table for lower(groupname) and typeid is a PK for table status_types.
I would think it should be using the index on big_table (on groupname) to avoid sorting the results. I have tried several different options, even changing the query to something as simple as "select * from big_table where groupname LIKE 'foo%' order by group_name". Still performs the sort!!

I must be doing something wrong. Shouldn't the query planner favor the use of the index and NOT sorting?

Best Answer

You may get an ordered reverse index scan if you had an index on big_table( attr, lower(groupname) ) for your example query. The explain you posted estimates 14 rows output from the query, in which case the final sort is trivial and not worrying about. Note the sort estimation cost "Sort (cost=24511.42..24511.45 rows=14 width=150)" shows there's a big cost generating input to the sort (24511.42) and very little incremental cost for the sort (24511.45-24511.42 => 0.03).

You should generally run EXPLAIN with the ANALYZE option so you can see real times, not just estimates. Divergences between the actual time/rows from the estimated costs/rows is a signal that the planner has bad statistics or lacks adequate rules for proper estimation. DEPESZ's explain explainer tool is often handy for hunting the real problems with your queries.