Is it possible to refresh a materialized view incrementally in PostgreSQL i.e. only for the data that is new or has changed?
Consider this table & materialized view:
CREATE TABLE graph (
xaxis integer NOT NULL,
value integer NOT NULL,
);
CREATE MATERIALIZED VIEW graph_avg AS
SELECT xaxis, AVG(value)
FROM graph
GROUP BY xaxis
Periodically, new values are added to graph
or an existing value is updated. I want to refresh the view graph_avg
every couple of hours only for the values that have updated. However in PostgreSQL 9.3, the whole table is refreshed. This is quite time consuming. The next version 9.4 allows CONCURRENT
update but it still refreshes the entire view. With 100s of millions of rows, this takes a few minutes.
What's a good way to keep track of updated & new values and only refresh the view partially?
Best Answer
You can always implement your own table serving as "materialized view". That's how we did it before
MATERIALIZED VIEW
was implemented in Postgres 9.3.You can create a plain
VIEW
:And materialize the result once or whenever you need to start over:
(Or use the
SELECT
statement directly, without creating aVIEW
.)Then, depending on undisclosed details of your use case, you can
DELETE
/UPDATE
/INSERT
changes manually.A basic DML statement with data-modifying CTEs for your table as is:
Assuming nobody else tries to write to
graph_avg
concurrently (reading is no problem):Basic recipe
timestamp
column with defaultnow()
to your base table. Let's call itts
.xaxis
orvalue
.Create a tiny table to remember the timestamp of your latest snapshot. Let's call it
mv
:Create this partial, multicolumn index:
Use the timestamp of the last snapshot as predicate in your queries to refresh the snapshot with perfect index usage.
At the end of the transaction, drop the index and recreate it with the transaction timestamp replacing the timestamp in the index predicate (initially
'-infinity'
), which you also save to your table. Everything in one transaction.Note that the partial index is great to cover
INSERT
andUPDATE
operations, but notDELETE
. To cover that, you need to consider the entire table. It all depends on exact requirements.