PostgreSQL – Datetime ranges overlap

gist-indexperformancepostgresqlpostgresql-performancerange-types

I have a table with datetime fields start and end. And I have a list of (start, end) items. I need to check which items from the list overlap with data in the table.
The current query looks like this:

select br.duration from booking, (
    select tstzrange('2016-09-06 03:45:00+00', '2016-09-06 14:45:00+00') as duration 
    union select tstzrange('2016-09-06 14:45:00+00', '2016-09-06 15:45:00+00') as duration
    -- other items from my list
) as br 
where tstzrange(start, end) && br.duration

Are there any other ways to do it? Do you think it will work if I have millions rows in the table and will compare them with hundreds items from the list?

Best Answer

I suggest a couple of important improvements for dealing with a million rows:

SELECT br.duration
FROM  (
   VALUES 
      ('[2016-09-06 03:45:00+00, 2016-09-06 14:45:00+00)'::tstzrange)  
    , ('[2016-09-06 14:45:00+00, 2016-09-06 15:45:00+00)')
      -- more items
   ) br(duration)
WHERE EXISTS (
   SELECT FROM booking
   WHERE  tstzrange(ts_start, ts_end) && br.duration
   );

While providing your list of values with the needlessly verbose and expensive form SELECT ... UNION ..., make that UNION ALL, or Postgres will waste time trying to fold duplicates. And you would only need to declare column name(s) and data type(s) for the first SELECT of the UNION query.
But a VALUES expression is simpler and faster. Or provide an array tstzrange[] and use unnest():
- Optimizing a Postgres query with a large IN
The query you have would return one row for every overlapping row in booking, while you probably want each overlapping value from the list once, most likely. You could add DISTINCT or GROUP BY to get unique rows, but that would still be a waste of time. An EXISTS semi-join is one of the much simpler and cheaper alternatives for your case: Each row from duration is returned exactly once if an overlapping entry is found and Postgres can stop looking further for this row.
The query would still be slow without index support. Create a functional GiST or SP-GiST index. The latter probably performing best:
```
CREATE INDEX booking_ts_range_idx on booking USING spgist (tstzrange(ts_start, ts_end));
```

Optimizing queries on a range of timestamps (two columns)
Native way of performing this hours of operation query in PostgreSQL
Help with adding a time filter to Postgres query
Perform this hours of operation query in PostgreSQL (incl. SP-GiST index example and benchmark comparing alternatives)

Related Solutions

Postgresql – Best way of finding rows referencing a given id on PostgreSQL

I suggest your first option, with two improvements and some simplifications.

(
SELECT 1      -- irrelevant what you select here
FROM   client_category_price
WHERE  sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460'
LIMIT  1      -- may be redundant
)
UNION ALL     -- not just UNION

  ...

UNION ALL
(
SELECT 1
FROM   work_order_item
WHERE  sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460'
LIMIT  1
)
LIMIT  1;      -- this one is crucial

Given that all you want to know is

if any of those (table, column) have my row's id there, which would prevent its deletion.

You don't need a full list of violating rows. Stop searching at the first one. All you need to do is add another LIMIT 1 at the end of the query. This way, Postgres skips rest of the query as soon as the first row is found. You probably don't need LIMIT 1 for each SELECT, just the one at the end. Test without, it may produce different query plans.
Use UNION ALL instead of UNION. Faster.
Some other simplifications.

Vertica performance on select from view with union

Vertica query performance depends highly on the predicate used in the query .

To get the gist of your performance , try getting the projection name of the selected columns of the query you are firing . The columns in the order by clause of the projection is very important in deciding the performance of your select. you can get that by running explain on your query .

Vertica improves performance by sorting the columns , compressing via encoding them so that they use minimal memory while running .

Also run analyze_histogram(tablename, 100) on all your tables . this will ensure statistics over complete data sample, not just 10% of data sample which is taken by analyze_statistics .

Also , since you are doing union you should try to keep the sort order of all the projections same as after union it might be meaningless .

Best Answer

Related Solutions

Postgresql – Best way of finding rows referencing a given id on PostgreSQL

Vertica performance on select from view with union

Related Question