PostgreSQL Index Optimization – Comparing Day of Week and Hour

datetimeindex-tuningpostgresqlpostgresql-9.6

I have a table of sensor observations with obs_ts timestamp, sensor_id text, sensor_val int, and to fill in gaps with data we have models by day of week and hour of day: model_id int, hour_of_day int, model_val int.

To gap fill missing values, we would join these two tables through a crossover table that is: sensor_id text, day_of_week int, model_id int

If our observations table is massive, what would be an optimal way of indexing it for joining on isodow and hour. Does indexing a timestamp also index functions like EXTRACT(isodow FROM obs_ts) or should I make those functional indexes explicit, e.g. CREATE INDEX ON observations (EXTRACT isodow FROM obs_tx). For joining on hour, would it be better to convert the hour_of_day to a timerange?

Best Answer

Per Verace's suggestion I created a test table with 10M records. The TLDR: Indexes on datetime functions need to be explicit, any conversion of that datetime column for joining/filtering will not be served by an index.

SELECT obs_ts AS weekday_indexed, obs_ts AS hour_indexed , * 
INTO index_test
FROM observations
LIMIT 10000000;

CREATE INDEX ON index_test(EXTRACT('isodow' from weekday_indexed));
CREATE INDEX ON index_test(EXTRACT('hour' FROM hour_indexed));
CREATE INDEX ON index_test(obs_ts);
ANALYSE index_test;

Table size is 1.7 GB. All of the indexes are 214MB. From my tests the index on obs_ts is never used (probably because these comparisons involve changing the data-type of the timestamp column...). The function specific indexes are used, so there is definitely a performance-index size tradeoff.

Day of Week

SELECT COUNT(1)
FROM index_test
WHERE extract('isodow' from obs_ts )=5
--3 secs 276 msec.

SELECT COUNT(1)
FROM index_test
WHERE extract('isodow' from weekday_indexed)=5
--1 secs 586 msec.

Hour of Day

SELECT COUNT(1)
FROM index_test
WHERE extract('hour' from obs_ts )=5
-- Total query runtime: 2 secs 420 msec.

EXPLAIN ANALYZE SELECT COUNT(1)
FROM index_test
WHERE obs_ts::TIME >= '05:00' AND obs_ts::TIME < '06:00'
-- Total query runtime: 2 secs 391 msec.

SELECT COUNT(1)
FROM index_test
WHERE tx::TIME @> timerange('05:00', '06:00')
-- Total query runtime: 1 secs 928 msec.

SELECT COUNT(1)
FROM index_test
WHERE extract('hour' from hour_indexed)=5
-- 779 msec.