PostgreSQL – Performance of Rare SELECT vs Frequent INSERT in Timeseries Data

index-tuningpartitioningperformancepostgresqlpostgresql-performance

I have a simple timeseries table

movement_history (
    data_id serial,
    item_id character varying (8),
    event_time timestamp without timezone,
    location_id character varying (7),
    area_id character varying (2)
);

My frontend developer is telling me that the cost is too high if he wants to know where an item is at a given timestamp because he has to sort the table. He wants me to add another timestamp field for the next event so he doesn't have to sort. Yet that is going to more than double the cost of my code to insert a new movement as I will need to query for the previous entry for the item, update that and then insert the new data.

My inserts of course far outnumber his queries in frequency. And I have never seen a timeseries table which included an entry for the time of the next event. He's telling me my table is broken because his infrequent query requires a sort. Any suggestions?

I don't know what query he is using but I would be doing this:

select * from movement_history 
where event_time <= '1-15-2015'::timestamp  
and item_id = 'H665AYG3' 
order by event_time desc limit 1;

We currently have about 15K items they are at most entered into the database once a day. However we will soon have 50K of items with sensor data that is updated every 1 to 5 minutes.

I do not see his query being performed very often but another query to get the current status of the pallets will be.

select distinct on (item_id) * 
from movement_history 
order by item_id, event_time desc;

This server is currently running 9.3 but it could be running on 9.4 if it needs to.

Best Answer

Create an index on (item_id, event_time).

It will jump to the specified item_id, jump to the specified event_time for that item_id, and then move back one. No sorting involved.