Postgresql – How to deal with datetime ranges in a table with separate columns for date and time

datetimepostgresqlpostgresql-9.4

In a table that stores events, the date and time are two separate columns:

CREATE TABLE events (
    pk int serial, 
    detail text, 
    ev_date date, 
    ev time without time zone
);

If I filter for dates:

ev_date BETWEEN start_date::date AND end_date::date AND 
ev_time BETWEEN start_date::time AND end_date::time 

I am missing all events within the two dates that occurred outside the time range of every day.

Thus, the start time is only relevant for the start date and the same for the end date.

Anyone with an advice on how to do it efficiently for millions of events?

Best Answer

You can use the + operator.

SELECT pk,ev_date,ev FROM events;

 pk |  ev_date   |    ev    
----+------------+----------
  1 | 2016-02-19 | 01:00:00
  2 | 2016-02-19 | 02:00:00
  3 | 2016-02-19 | 05:00:00
  4 | 2016-02-19 | 12:00:00
  5 | 2016-02-19 | 18:00:00
  6 | 2016-02-19 | 23:00:00
  7 | 2016-02-20 | 01:00:00
  8 | 2016-02-20 | 05:00:00
  9 | 2016-02-20 | 12:00:00
 10 | 2016-02-20 | 18:00:00
(10 rows)

SELECT pk, ev_date, ev 
FROM events 
WHERE (ev_date + ev) 
    BETWEEN ('2016-02-19 04:00:00') 
        AND ('2016-02-20 02:00:00');

 pk |  ev_date   |    ev    
----+------------+----------
  3 | 2016-02-19 | 05:00:00
  4 | 2016-02-19 | 12:00:00
  5 | 2016-02-19 | 18:00:00
  6 | 2016-02-19 | 23:00:00
  7 | 2016-02-20 | 01:00:00
(5 rows)

Don't forget to create the index below:

CREATE INDEX events_ts_idx ON events ((ev_date + ev));
ANALYZE events;

I've inserted many dummy rows, so I show the result of EXPLAIN:

EXPLAIN ANALYZE SELECT pk, ev_date, ev FROM events  WHERE (ev_date + ev) 
    BETWEEN ('2016-02-19 23:50:00') 
        AND ('2016-02-20 00:01:00');
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using events_ts_idx on events  (cost=0.29..8.52 rows=8 width=16) (actual time=0.014..0.029 rows=42 loops=1)
   Index Cond: (((ev_date + ev) >= '2016-02-19 23:50:00'::timestamp without time zone) AND ((ev_date + ev) <= '2016-02-20 00:01:00'::timestamp without time zone))
 Planning time: 0.082 ms
 Execution time: 0.053 ms
(4 rows)

For comparison, I've created other index and tried other form:

CREATE INDEX events_ts2_idx ON events (ev_date,ev);
ANALYZE events;

EXPLAIN ANALYZE SELECT pk, ev_date, ev FROM events  WHERE (ev_date,ev) 
    BETWEEN ('2016-02-19','23:50:00') 
        AND ('2016-02-20','0:01:00');
                             QUERY PLAN
--------------------------------------------------------------------------
 Bitmap Heap Scan on events  (cost=189.50..511.36 rows=7143 width=16) (actual time=0.027..0.042 rows=42 loops=1)
   Recheck Cond: ((ROW(ev_date, ev) >=ROW('2016-02-19'::date,'23:50:00'::time without time zone)) AND (ROW(ev_date, ev) <= ROW('2016-02-20'::date, '00:01:00'::time without time zone)))
   Heap Blocks: exact=7
   ->  Bitmap Index Scan on events_ts2_idx  (cost=0.00..187.72rows=7143 width=0) (actual time=0.019..0.019 rows=42 loops=1)
         Index Cond: ((ROW(ev_date, ev) >= ROW('2016-02-19'::date,'23:50:00'::time without time zone))AND(ROW(ev_date, ev) <= ROW('2016-02-20'::date, '00:01:00'::time without time zone)))
 Planning time: 0.079 ms
 Execution time: 0.071 ms
(7 rows)

According to my investigation, my way (using + operator) is better. I recommend to compare with both ways on your machine.