PostgreSQL – Datetime ranges overlap

gist-indexperformancepostgresqlpostgresql-performancerange-types

I have a table with datetime fields start and end. And I have a list of (start, end) items. I need to check which items from the list overlap with data in the table.
The current query looks like this:

select br.duration from booking, (
    select tstzrange('2016-09-06 03:45:00+00', '2016-09-06 14:45:00+00') as duration 
    union select tstzrange('2016-09-06 14:45:00+00', '2016-09-06 15:45:00+00') as duration
    -- other items from my list
) as br 
where tstzrange(start, end) && br.duration

Are there any other ways to do it? Do you think it will work if I have millions rows in the table and will compare them with hundreds items from the list?

Best Answer

I suggest a couple of important improvements for dealing with a million rows:

SELECT br.duration
FROM  (
   VALUES 
      ('[2016-09-06 03:45:00+00, 2016-09-06 14:45:00+00)'::tstzrange)  
    , ('[2016-09-06 14:45:00+00, 2016-09-06 15:45:00+00)')
      -- more items
   ) br(duration)
WHERE EXISTS (
   SELECT FROM booking
   WHERE  tstzrange(ts_start, ts_end) && br.duration
   );
  • While providing your list of values with the needlessly verbose and expensive form SELECT ... UNION ..., make that UNION ALL, or Postgres will waste time trying to fold duplicates. And you would only need to declare column name(s) and data type(s) for the first SELECT of the UNION query.
    But a VALUES expression is simpler and faster. Or provide an array tstzrange[] and use unnest():

  • The query you have would return one row for every overlapping row in booking, while you probably want each overlapping value from the list once, most likely. You could add DISTINCT or GROUP BY to get unique rows, but that would still be a waste of time. An EXISTS semi-join is one of the much simpler and cheaper alternatives for your case: Each row from duration is returned exactly once if an overlapping entry is found and Postgres can stop looking further for this row.

  • The query would still be slow without index support. Create a functional GiST or SP-GiST index. The latter probably performing best:

    CREATE INDEX booking_ts_range_idx on booking USING spgist (tstzrange(ts_start, ts_end));
    

Related: