Postgresql – Most efficient way to retrieve data by timestamps

indexperformancepostgresqlpostgresql-9.2postgresql-performance

I'm using PostgreSQL 9.2.8.

I have table like:

CREATE TABLE foo
(
    foo_date timestamp without time zone NOT NULL,
    -- other columns, constraints
)

This table contains about 4.000.000 rows. One day data is about 50.000 rows.

My goal is to retrieve one day data as fast as possible.

I have created an index like:

CREATE INDEX foo_foo_date_idx
ON foo
USING btree
      (date_trunc('day'::text, foo_date));

And now I'm selecting data like this (now() is just an example, i need data from ANY day):

select * 
from foo
where date_trunc('day'::text, now()) = date_trunc('day'::text, foo_date) 

This query lasts about 20 s.

Is there any possiblity to obtain same data in shorter time?

EDIT:

EXPLAIN ANALYZE of select above:

"Index Scan using foo_foo_date_idx on foo (cost=0.01..20762.83 rows=10421 width=1534) (actual time=0.180..249.126 rows=19773 loops=1)"
"  Index Cond: (date_trunc('day'::text, now()) = date_trunc('day'::text, foo_date))"
"Total runtime: 251.883 ms"

I have changed a day for 15-12-2015. What is more, I have replace * with only specific columns. Explain plan:

"Index Scan using foo_foo_date_idx on foo (cost=0.00..57876.52 rows=29184 width=62) (actual time=22.384..28243.356 rows=33153 loops=1)"
"  Index Cond: ('2015-12-14 00:00:00'::timestamp without time zone = date_trunc('day'::text, foo_date))"
"Total runtime: 28251.876 ms"

Last query could cached i think then… Any other explaination?

Best Answer

To check if an index is working, use SELECT COUNT(*) rather than SELECT *, otherwise you're not only measuring index performance but also the time it takes to transfer all the data across your client application, which in this case is a whooping 50,000 rows!

You can significantly simplify your query and index by date datatype instead of timestamp, which is smaller and therefore it should perform better:

CREATE INDEX foo_foo_date_idx ON foo
USING btree ((foo_date::date));

select count(*) from foo
where foo_date::date = CURRENT_DATE;

You could reduce that index even further by using a smallint calculated as the difference between a set date (e.g. 1/1/2010 +- 32768 days = Apr-1920 to Sep-2099):

CREATE INDEX foo_foo_date_idx ON foo
USING btree (((foo_date::date - '2010-01-01'::date)::int2));

select count(*) from foo
where (foo_date::date - '2010-01-01'::date)::int2 = ('today'-'2010-01-01'::date);