postgresql,index,datetime – Search Efficiency of Datetime vs Date + Indexed Smallint in PostgreSQL

datetimeindexpostgresql

Context

We are storing weather forecasts as multiband rasters (using Postgres with PostGIS). Each forecast is identified by its date and time (either 0:00, 6:00, or 12:00). We are looking at about 20 years of data. Besides analyses conducted for research purposes, for which speed isn't of primary importance, there will also be an online dashboard that will display the latest forecast(s) on a map.

The table could have the following columns:
datetime, metadata_id, raster_data

I foresee two main use cases: (1) either getting the most recent forecast, or (2) selecting a single past forecast. The queries could look something like this:

  1. SELECT datetime, metadata_id, raster_data
    FROM myTable
    ORDER BY datetime DESC
    LIMIT 1;
  2. SELECT datetime, metadata_id, raster_data
    FROM myTable
    WHERE datetime= '2015-10-12 12:00:00'::timestamp;

Broader Context

The above situation actually applies to 6 different tables that only differ in that their data comes from different sources, and different physical variables are represented. I am therefore considering a single table with the following columns:
datetime, dataSource_physVar, metadata_id, raster_data

The different sources do have different raster extents, so I am consider using child tables.

Question 1

Given the limited use cases, is it better to use a single datetime field, or rather a date field with an indexed smallint field for the hour?

Question 2

Given the quantity of data (365*3*20 years = 22k lines), is it even worth worrying about efficiency?

Best Answer

I'm putting down my recommendations for your two questions and additional considerations for your index.

1

I'd keep the timestamp as one field, breaking it into two separate fields for date and a smallint for the hour will reduce the space by 2 bytes per row (aka 43 MB with your 22k row estimation) having to compare two fields for date /time doesn't seem worth the small space gain.

2

22k rows may not be many in comparison to some databases, if you're picking out individual occurrences though then you still want it as fast as possible. finding a specific value from a heap requires a full table scan which can take time, if it's indexed then you do an index seek which means you can find records a lot faster.

Additional

If the data is broken up to different rows by location (eg, '2015/10/13 00:00', 'London', 'sunny') then I'd advise a composite key between the timestamp and location

I don't believe that is the case, it sounds like the data will be stored in one long line, so (datetime timestamp, London (data), Paris(data)), for this just index the timestamp field, as an index would suffice

Hope that helps