Postgresql – How to do an incremental refresh of a PostgreSQL Materialized View

materialized-viewpostgresql

I'm looking at the PostgreSQL docs where:

REFRESH MATERIALIZED VIEW completely replaces the contents of a materialized view. The old contents are discarded.

However, since 9.5 I believe, there is a way to update a MV without having to regenerate all the data.

Best Answer

There is no INCREMENTAL, it's very simple.. From the docs

REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name
    [ WITH [ NO ] DATA ]

Perhaps you're confusing it with CONCURRENTLY which is about concurrency (locking) and not minimizing updates.

Refresh the materialized view without locking out concurrent selects on the materialized view. Without this option a refresh which affects a lot of rows will tend to use fewer resources and complete more quickly, but could block other connections which are trying to read from the materialized view. This option may be faster in cases where a small number of rows are affected.

If you need an incremental refresh, use a table to store the results and join it against the source inserting only rows that NOT EXISTS,

INSERT INTO materialized_resultset
SELECT foo,bar,baz
FROM src
WHERE NOT EXISTS (
  SELECT 1
  FROM materialized_resultset AS mr
  WHERE src.foo = mr.foo
);

or use the newer ON CONFLICT UPDATE or ON CONFLICT DO NOTHING on INSERT.

Related Solutions

Limit redo for materialized view complete refresh or manual equivalent

This is just intended to demonstrate redo usage of various insert operations rather than answer the whole question. Results on my 10g instance are not 100% deterministic, but the broad picture remained the same each time I ran through.

For the heap tables, I do not know why the insert /*+ append */ generated more redo.

testbed:

create table heap_noappend(id integer, dummy char(500));
create table heap_append(id integer, dummy char(500));
create global temporary table gtt_noappend(id integer, dummy char(500));
create global temporary table gtt_append(id integer, dummy char(500));
create global temporary table gtt_results(stage integer, val integer);

test:

insert into gtt_results(stage, val)
select 0, value from v$statname join v$sesstat using(statistic#)
where sid=sys_context('userenv','sid') and name='redo size';

insert into heap_noappend(id, dummy)
select level, 'A' from dual connect by level<1000;

insert into gtt_results(stage, val)
select 1, value from v$statname join v$sesstat using(statistic#)
where sid=sys_context('userenv','sid') and name='redo size';

insert /*+ append */ into heap_append(id, dummy)
select level, 'A' from dual connect by level<1000;

insert into gtt_results(stage, val)
select 2, value from v$statname join v$sesstat using(statistic#)
where sid=sys_context('userenv','sid') and name='redo size';

insert into gtt_noappend(id, dummy)
select level, 'A' from dual connect by level<1000;

insert into gtt_results(stage, val)
select 3, value from v$statname join v$sesstat using(statistic#)
where sid=sys_context('userenv','sid') and name='redo size';

insert /*+ append */ into gtt_append(id, dummy)
select level, 'A' from dual connect by level<1000;

insert into gtt_results(stage, val)
select 4, value from v$statname join v$sesstat using(statistic#)
where sid=sys_context('userenv','sid') and name='redo size';

result:

select * 
from( select decode(stage,1,'heap noappend',
                          2,'heap append',
                          3,'gtt noappend',
                          4,'gtt append') as operation, 
             val-lag(val) over(order by stage) as redo 
      from gtt_results)
where redo is not null;

OPERATION     REDO                   
------------- ---------------------- 
heap noappend 606932                 
heap append   690768                 
gtt noappend  41488                  
gtt append    256

Postgresql – Refresh materalized view incrementally in PostgreSQL

You can always implement your own table serving as "materialized view". That's how we did it before MATERIALIZED VIEW was implemented in Postgres 9.3.

You can create a plain VIEW:

CREATE VIEW graph_avg_view AS 
SELECT xaxis, AVG(value) AS avg_val
FROM   graph
GROUP  BY xaxis;

And materialize the result once or whenever you need to start over:

CREATE TABLE graph_avg AS
SELECT * FROM graph_avg_view;

(Or use the SELECT statement directly, without creating a VIEW.)
Then, depending on undisclosed details of your use case, you can DELETE / UPDATE / INSERT changes manually.

A basic DML statement with data-modifying CTEs for your table as is:

Assuming nobody else tries to write to graph_avg concurrently (reading is no problem):

WITH del AS (
   DELETE FROM graph_avg t
   WHERE  NOT EXISTS (SELECT FROM graph_avg_view WHERE xaxis = t.xaxis)
   )
, upd AS (
   UPDATE graph_avg t
   SET    avg_val = v.avg_val
   FROM   graph_avg_view v
   WHERE  t.xaxis = v.xaxis
   AND    t.avg_val <> v.avg_val
-- AND    t.avg_val IS DISTINCT FROM v.avg_val  -- alt if avg_val can be NULL
   )
INSERT INTO graph_avg t  -- no target list, whole row
SELECT v.*
FROM   graph_avg_view v
WHERE  NOT EXISTS (SELECT FROM graph_avg WHERE xaxis = v.xaxis);

Basic recipe

Add a timestamp column with default now() to your base table. Let's call it ts.
- If you have updates, add a trigger to set the current timestamp with every update that changes either xaxis or value.

Create a tiny table to remember the timestamp of your latest snapshot. Let's call it mv:

CREATE TABLE mv (
   tbl text PRIMARY KEY
 , ts timestamp NOT NULL DEFAULT '-infinity'
); -- possibly more details

Create this partial, multicolumn index:

CREATE INDEX graph_mv_latest ON graph (xaxis, value)
WHERE  ts >= '-infinity';

Use the timestamp of the last snapshot as predicate in your queries to refresh the snapshot with perfect index usage.
At the end of the transaction, drop the index and recreate it with the transaction timestamp replacing the timestamp in the index predicate (initially '-infinity'), which you also save to your table. Everything in one transaction.
Note that the partial index is great to cover INSERT and UPDATE operations, but not DELETE. To cover that, you need to consider the entire table. It all depends on exact requirements.

Best Answer

Related Solutions

Limit redo for materialized view complete refresh or manual equivalent

Postgresql – Refresh materalized view incrementally in PostgreSQL

Basic recipe

Related Question