PostgreSQL – Efficiently Store Time-Based Data

performancepostgresqltime-series-database

As a reporting solution, I have a large table with ~50M records called "reports". Currently, I am using PostgreSQL with partitioning table feature day by day, (table name reports_20170101 means that all the records of 2017-01-01 stores in there).

An example query (runs 41 seconds)

SELECT to_char(date_trunc('week', rpt_datetime), 'YYYY-WW') date_week,
SUM(rpt_revenue) revenue FROM reports 
WHERE rpt_datetime < ? 
GROUP BY date_week

EXPLAIN, ANALYZE result:

https://gist.github.com/onesvat/be234fbcb6c4d375f9d1dd4151d69391

If the interval is small, there is no problem but when I use larger intervals, it becomes too slow (20 or more seconds).

The system has 128GB Ram, 16 threads and 4 SSD disk with raid0.

The system never deletes or updates yesterday and backward tables so there is no need the re-query older datas. Is there any database or extension that can handle these type of queries in reasonable time?

There is indexes on most columns that includes date and something, in fact, there is additional indexes like:

btree (date_trunc('week'::text, timezone('Europe/Istanbul'::text, rpt_datetime)))

also constraint :

"reports_20170101_rpt_datetime_check" CHECK (rpt_datetime >= '2017-01-01 00:00:00+00'::timestamp with time zone AND rpt_datetime < '2017-01-02 00:00:00+00'::timestamp with time zone)

Best Answer

You might look into using materialized views, if the partitions become effectively read-only on a daily basis. You can pre-calculate the results of these queries for the older data in the materialized views.