Postgres – caching of recently inserted data

performancequery-performance

Does Postgres cache recently INSERTed data? And if so, how can I increase the amount?

Here’s why I’m asking. I have a process that continuously inserts simple text messages into a table at an average rate of several million records a day (and contains several months of data). I have a second process that reads recent records from this table for processing. I’ve noticed that if I SELECT (for example) 100 records that have arrived in the last 6 hours it might take me 1 second. However if I select 200 record from the last 12 hours it might take (typically) 8 seconds and if I select 400 records from the last 24 hours it might take more than a minute.

Or in other words, there seems to be an almost exponential increase in the time to select data as it’s age increase. In my naiveite I had assumed there would be an approximately linear relationship between the period specified in the WHERE clause, and the time taken to retrieve the data.

For the record, here’s the table definition….

CREATE TABLE message
(
  messid SERIAL UNIQUE,
  rawmessage character(250), // raw, unprocessed message (ascii string)
  loadedtime timestamp without time zone, // Time this message was loaded 
  deviceNumber   INTEGER     // unique identifier of device that generated message.
)

And here’s the sort of query I’m running (deviceNumber and loadedTime are both indexed)

SELECT * FROM message
WHERE deviceNumber = 1234567
AND loadedtime >='10-May-2018 21:00:00’
AND loadedtime <='10-May-2018 22:00:00’

The problem is that the process that reads the table occasionally falls behind schedule and needs to catch up, but because of the exponential time increase in retrieving slightly older data it has difficulty getting the backlog down.

My question comes in two parts….

  1. Why is there such a discrepancy in the time taken to select recent
    data and older data. Is it because recent INSERTS are also held in
    memory? 2)
  2. Is there anyway to improve the time to select data added
    in the last 12 hours to be close to that achieved with data only one
    hour old?

Thanks

Update – I'm will attempt to add a composite index on both loadedTime and deviceNumber as suggested (both already have individual indexes). This may take some time.
Once I've added a composite index, do I still need the individual indexes on those fields?

Best Answer

Caching by Postgres or the operating system isn't going to help much when Postgres doesn't know which of the millions of records in your table fulfill your query's condition(s). The only way to make sure this is going to be consistently performing as you expect is to use indexes.

An index on the combination of deviceNumber and loadedtime is what you need in your case.

In your question you note that a query can be relatively fast (like a second for recently inserted data), so I'm strongly suspecting you already have an index on at least one of these columns.