Postgresql – Postgres Index scan forward vs backward = speed difference of 357X slower

postgresql

PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-51), 64-bit

Dedicated DB server

  • 4GB ram
  • Shared_Buffers = 1 GB
  • Effective_cache_size = 3GB
  • Work_mem = 32MB

Analyze done

Queries ran multiple times, same differences/results

Default Statistics = 1000

Query (5366ms) :

explain analyze
select
    initcap (fullname)
  , initcap(issuer)
  , upper(rsymbol)
  , initcap(industry)
  , activity
  , to_char(shareschange,'FM9,999,999,999,999,999')
  , sharespchange || + E'\%'
from changes
where activity in (4,5) and mfiled >= (select max(mfiled) from changes)
order by shareschange asc
limit 15

Slow Ascending explain Analyze:

http://explain.depesz.com/s/zFz

Query (15ms) :

explain analyze
select
    initcap (fullname)
  , initcap(issuer)
  , upper(rsymbol)
  , initcap(industry)
  , activity
  , to_char(shareschange,'FM9,999,999,999,999,999')
  , sharespchange ||+ E'\%'
from changes
where activity in (4,5) and mfiled >= (select max(mfiled) from changes)
order by shareschange desc limit 15

Fast descending explain analyze:

http://explain.depesz.com/s/OP7

The index: changes_shareschange is a btree index created with default
ascending order. The is index size is 32mb

The query plan and estimates are exactly the same, except desc has index
scan backwards instead of index scan for changes_shareschange.

Yet, actual runtime performance is different by 357x slower for the
ascending version instead of descending.

Why and how do I fix it?

Best Answer

Since I like replacing aggregate functions by old-fashioned self-joins and NOT EXISTS clauses, here is my attempt:

SET search_path='tmp';

DROP TABLE tmp.changes CASCADE;
CREATE TABLE tmp.changes
        ( id integer NOT NULL PRIMARY KEY
        , fullname varchar
        , issuer varchar
        , rsymbol varchar
        , industry varchar
        , activity INTEGER NOT NULL
        , shareschange FLOAT
        , sharespchange FLOAT
        , mfiled FLOAT
        );

        -- lacking information from the OP
        -- I can only presume a flat distribution.
INSERT INTO tmp.changes(id, activity, shareschange,sharespchange,mfiled )
SELECT nm.*
        , (random() *20)::integer -- mfiled
        , random() *10000
        , random() *100
        , random() *100000
FROM generate_series(1,1000000) nm
        ;

ALTER TABLE tmp.changes
        ALTER shareschange
        SET STATISTICS 1000
        ;
ALTER TABLE tmp.changes
        ALTER mfiled
        SET STATISTICS 1000
        ;

VACUUM ANALYZE tmp.changes
        ;


CREATE INDEX changes_mfiled_shareschange
    ON tmp.changes(mfiled,shareschange)
        ;

EXPLAIN ANALYZE
SELECT initcap(ch.fullname) AS some_name1
     , initcap(ch.issuer) AS some_name2
     , upper(ch.rsymbol) AS some_name3
     , initcap(ch.industry) AS some_name4
     , ch.activity
     , to_char(ch.shareschange,'FM9,999,999,999,999,999') AS some_name5
     , ch.sharespchange || '%' AS some_name6
FROM   changes ch
WHERE  ch.activity IN (4,5)
        -- NOTE: the subquery is *not* correlated.
        -- [I had expected a subselect of nx.activity IN (4,5)
        -- like in the main query. ]
AND    NOT EXISTS (SELECT * FROM changes nx
        WHERE nx.mfiled > ch.mfiled
        )
ORDER  BY ch.shareschange ASC
LIMIT  15
        ;