Postgresql – How to handle bad query plan caused by exact equality on range type

performancepostgresqlpostgresql-9.3query-performancerange-types

I'm performing an update where I require an exact equality on a tstzrange variable. ~1M rows are modified, and the query takes ~13 minutes. The result of EXPLAIN ANALYZE can be seen here, and the actual results are extremely different from those estimated by the query planner. The problem is that the index scan on t_range expects a single row to be returned.

This seems to be related to the fact that statistics on range types are stored differently from those of other types. Looking at the pg_stats view for the column, n_distinct is -1 and other fields (e.g. most_common_vals, most_common_freqs) are empty.

However, there must be statistics stored on t_range somewhere. An extremely similar update where I use a 'within' on t_range instead of an exact equality takes about 4 minutes to perform, and uses a substantially different query plan (see here). The second query plan makes sense to me because every row in the temp table and a substantial fraction of the history table will be used. More importantly, the query planner predicts an approximately correct number of rows for the filter on t_range.

The distribution of t_range is a bit unusual. I'm using this table to store the historical state of another table, and the changes to the other table occur all at once in big dumps, so there aren't many distinct values of t_range. Here are the counts corresponding to each of the unique values of t_range:

                              t_range                              |  count  
-------------------------------------------------------------------+---------
 ["2014-06-12 20:58:21.447478+00","2014-06-27 07:00:00+00")        |  994676
 ["2014-06-12 20:58:21.447478+00","2014-08-01 01:22:14.621887+00") |   36791
 ["2014-06-27 07:00:00+00","2014-08-01 07:00:01+00")               | 1000403
 ["2014-06-27 07:00:00+00",infinity)                               |   36791
 ["2014-08-01 07:00:01+00",infinity)                               |  999753

The counts for distinct t_range above are complete, so the cardinality is ~3M (of which ~1M will be affected by either update query).

Why does query 1 perform much more poorly than query 2? In my case, query 2 is a good substitute, but if an exact range equality was truly required, how could I get Postgres to use a smarter query plan?

Table definition with indexes (dropping irrelevant columns):

       Column        |   Type    |                                  Modifiers                                   
---------------------+-----------+------------------------------------------------------------------------------
 history_id          | integer   | not null default nextval('gtfs_stop_times_history_history_id_seq'::regclass)
 t_range             | tstzrange | not null
 trip_id             | text      | not null
 stop_sequence       | integer   | not null
 shape_dist_traveled | real      | 
Indexes:
    "gtfs_stop_times_history_pkey" PRIMARY KEY, btree (history_id)
    "gtfs_stop_times_history_t_range" gist (t_range)
    "gtfs_stop_times_history_trip_id" btree (trip_id)

Query 1:

UPDATE gtfs_stop_times_history sth
SET shape_dist_traveled = tt.shape_dist_traveled
FROM gtfs_stop_times_temp tt
WHERE sth.trip_id = tt.trip_id
AND sth.stop_sequence = tt.stop_sequence
AND sth.t_range = '["2014-08-01 07:00:01+00",infinity)'::tstzrange;

Query 2:

UPDATE gtfs_stop_times_history sth
SET shape_dist_traveled = tt.shape_dist_traveled
FROM gtfs_stop_times_temp tt
WHERE sth.trip_id = tt.trip_id
AND sth.stop_sequence = tt.stop_sequence
AND '2014-08-01 07:00:01+00'::timestamptz <@ sth.t_range;

Q1 updates 999753 rows and Q2 updates 999753+36791 = 1036544 (ie, the temp table is such that every row matching the time range condition is updated).

I tried this query in response to @ypercube's comment:

Query 3:

UPDATE gtfs_stop_times_history sth
SET shape_dist_traveled = tt.shape_dist_traveled
FROM gtfs_stop_times_temp tt
WHERE sth.trip_id = tt.trip_id
AND sth.stop_sequence = tt.stop_sequence
AND sth.t_range <@ '["2014-08-01 07:00:01+00",infinity)'::tstzrange
AND '["2014-08-01 07:00:01+00",infinity)'::tstzrange <@ sth.t_range;

The query plan and results (see here) were intermediate between the two previous cases (~6 minutes).

2016/02/05 EDIT

No longer having access to the data after 1.5 years, I created a test table with the same structure (with no indexes) and similar cardinality. jjanes's answer proposed that the cause might be the ordering of the temporary table used for the update. I was unable to test the hypothesis directly because I don't have access to track_io_timing (using Amazon RDS).

  1. The overall results were much faster (by a factor of several). I'm guessing that this is because of the removal of the indices, consistent with Erwin's answer.

  2. In this test case, Queries 1 and 2 basically took the same amount of time, because they both used the merge join. That is, I was unable to trigger whatever was causing Postgres to choose the hash join, so I have no clarity on why Postgres was choosing the poorly-performing hash join in the first place.

Best Answer

The biggest difference in time in your execution plans is on the top node, the UPDATE itself. This suggests that most of your time is going to IO during the update. You could verify this by turning on track_io_timing and running the queries with EXPLAIN (ANALYZE, BUFFERS)

The different plans are presenting rows to be updated in different orders. One is in trip_id order, and the other is in whichever order they happen to be physically present in in the temp table.

The table being updated seems to have its physical order correlated with the trip_id column, and updating rows in this order leads to efficient IO patterns with read-ahead/sequential reads. While the physical order of the temp table seems to lead to a lot of random reads.

If you can add an order by trip_id to the statement which created the temp table, that might solve the problem for you.

PostgreSQL does not take the effects of IO ordering into account when planning the UPDATE operation. (Unlike SELECT operations, where it does take them into account). If PostgreSQL were cleverer, it would either realize that one plan produces a more efficient order, or it would interject an explicit sort node between the update and its child node so that the update would get fed rows in ctid order.

You are correct that PostgreSQL does a poor job estimating the selectivity of equality joins on ranges. However, this is only tangentially related to your fundamental problem. A more efficient query on the select portion of your update might accidentally happen to feed rows into the update-proper in a better order, but if so that is mostly down to luck.