Postgresql – How to count selectivity rows in PostgreSQL 8.2

index-tuningoptimizationpostgresqlpostgresql-8.2

I have problem with two queries on PostgreSQL v8.2. It seem like the optimizer chooses a wrong index:

db=# explain 
select count(*), messagetype, alias 
from event 
where 
    templateinfoid = 10 and 
    templateid = 12458 and 
    '2013-03-01 00:00:00' < gw_out_time  and 
    '2013-03-31 23:59:59'  >= gw_out_time  
group by messagetype, alias 
order by alias asc;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1128.74..1128.74 rows=1 width=14)
   Sort Key: alias
   ->  HashAggregate  (cost=1128.72..1128.73 rows=1 width=14)
         ->  Index Scan using idx_event_tid_tinfoid_outtime on event  (cost=0.00..1125.93 rows=372 width=14)
               Index Cond: ((templateinfoid = 10) AND (templateid = 12458) AND ('2013-03-01 00:00:00'::timestamp without time zone < gw_out_time) AND ('2013-03-31 23:59:59'::timestamp without time zone >= gw_out_time))
(5 rows)

db=# explain 
select count(*), messagetype, alias 
from event 
where 
    templateinfoid = 10 and 
    templateid = 12458 and 
    '2013-03-02 00:00:00' < gw_out_time  and 
    '2013-03-31 23:59:59'  >= gw_out_time  
group by messagetype, alias 
order by alias asc;
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=45.48..45.51 rows=1 width=14)
   ->  Sort  (cost=45.48..45.49 rows=1 width=14)
         Sort Key: alias, messagetype
         ->  Index Scan using idx_event_gw_out_time on event  (cost=0.00..45.47 rows=1 width=14)
               Index Cond: (('2013-03-02 00:00:00'::timestamp without time zone < gw_out_time) AND ('2013-03-31 23:59:59'::timestamp without time zone >= gw_out_time))
               Filter: ((templateinfoid = 10) AND (templateid = 12458))
(6 rows)

I have read 54.1. Row Estimation Examples (http://www.postgresql.org/docs/8.2/static/row-estimation-examples.html) article and I counted the selectivity of rows by templateid and templateinfoid columns.
My stats:

db # select * from pg_stats where tablename = 'event' and attname IN ('gw_out_time','templateid','templateinfoid');
-[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname        | public
tablename         | event
attname           | gw_out_time
null_frac         | 0.00166667
avg_width         | 8
n_distinct        | -1
most_common_vals  | 
most_common_freqs | 
histogram_bounds  | {"2010-01-03 18:13:27.334","2010-11-21 08:44:41.704","2011-05-06 22:08:09.59","2011-09-15 22:21:14.904","2011-12-14 15:03:18.674","2012-03-20 14:18:21.59","2012-06-22 10:22:33.607","2012-09-13 11:19:39.55","2012-11-22 12:34:26.521","2012-12-30 17:50:00.672","2013-03-01 21:04:18.16"}
correlation       | 0.989909
-[ RECORD 2 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname        | public
tablename         | event
attname           | templateid
null_frac         | 0.02
avg_width         | 8
n_distinct        | 439
most_common_vals  | {3182,5125,699,8415,9616,8975,6249,934,5320,2877}
most_common_freqs | {0.078,0.0663333,0.043,0.0403333,0.0403333,0.0363333,0.0303333,0.0276667,0.0206667,0.0186667}
histogram_bounds  | {42,2271,4944,5597,6904,8824,9674,10135,10759,12356,12946}
correlation       | 0.702483
-[ RECORD 3 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname        | public
tablename         | event
attname           | templateinfoid
null_frac         | 0.02
avg_width         | 8
n_distinct        | 8
most_common_vals  | {10,3,6,2,5,12,11,9}
most_common_freqs | {0.697667,0.184333,0.0623333,0.0176667,0.011,0.005,0.00133333,0.000666667}
histogram_bounds  | 
correlation       | 0.741422

and

db # SELECT reltuples, relpages FROM pg_class WHERE relname = 'event';
  reltuples  | relpages 
-------------+----------
 2.74586e+08 | 18752867
(1 row)

so I did:

templateinfoid selectivity: 
    0.697667 * 18752867 = 13.083.256,461289

templateid selectivity: 
    (1 + (12946 - bckt[2].min)/(bckt[2].max - bckt[2].min))/num_bckts * 18752867 = 
    (1 + (12458 - 12356)/(12946 - 12356))/10 * 18752867 = 
    0,117288136 * 18752867 = 
    2.199.488,807457627

How to count rows selectivity for the gw_out_time column which is timestamp?

Best Answer

Consider

INDEX(templateinfoid, templateid,  -- first, but in either order    
      gw_out_time,
      messagetype, alias)  -- last, for "covering" (either order)

Also, consider either of these (to avoid an extra sort):

group by messagetype, alias 
order by messagetype, alias 

or

group by alias, messagetype
order by alias, messagetype