Postgresql – Multiple concurrent “REFRESH MATERIALIZED VIEW”: how to manage

materialized-viewpostgresqlpostgresql-9.6

I have a large Postgres database with many tables, some with tens of millions of rows. Several working processes update the database concurrently. For faster searches, relevant data is compiled into a materialized view.

There are possibly multiple parallel processes writing to the database, then refreshing the materialized view afterwards. However, as the "REFRESH MATERIALIZED VIEW" query takes at least several minutes, quite often such queries pile up in a queue, and they all execute one after the other.

Unfortunately in such cases, only the latest query is of any relevance; all the previous queries consume processing time in vain to refresh stale data. Is there a way for stopping already running calls to "REFRESH MATERIALIZED VIEW" when a new call is issued?

Please note that "REFRESH MATERIALIZED VIEW CONCURRENTLY" has the same behaviour, but considerably slows the refresh (from a few minutes up to an hour), therefore aggravating the performance problem.

Of course it's possible to test for an existing lock on the view for each new query, so it's easy to cancel new queries; the problem is that I'd rather cancel the old queries and keep only the latest one…

Best Answer

It looks like you want to refresh the materialized views whenever the data in the tables change. If you do that with materialized views, it will take a long time, and updates will block each other and queries.

Maybe you can build your own “on commit refresh” materialized views as tables.

A simple example:

Instead of

CREATE MATERIALIZED VIEW sum_eumel AS
SELECT eumel_category,
       sum(eumel_data) AS eumel_sum
FROM eumel
GROUP BY eumel_category;

you could do this:

BEGIN;

CREATE TABLE sum_eumel (
   eumel_category text NOT NULL PRIMARY KEY,
   eumel_sum bigint NOT NULL DEFAULT 0
);

CREATE FUNCTION eumel_trigger() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF TG_OP IN ('UPDATE', 'DELETE') THEN
      /*
       * Will leave rows with value 0 after the last
       * row for a category has been deleted.
       */
      UPDATE sum_eumel
      SET eumel_sum = eumel_sum - OLD.eumel_data
      WHERE eumel_category = OLD.eumel_category;
   END IF;

   IF TG_OP IN ('INSERT', 'UPDATE') THEN
      INSERT INTO sum_eumel (eumel_category, eumel_sum)
      VALUES (NEW.eumel_category, NEW.eumel_data)
      ON CONFLICT TO UPDATE
      SET eumel_sum = sum_eumel.eumel_sum + EXCLUDED.eumel_data
   END IF;

   IF TG_OP = 'TRUNCATE' THEN
      TRUNCATE sum_eumel;
      RETURN NULL;
   END IF;

   IF TG_OP = 'DELETE' THEN
      RETURN OLD;
   ELSE
      RETURN NEW;
   END IF;
END;$$;

CREATE TRIGGER eumel_dml_trig
   AFTER INSERT OR UPDATE OR DELETE ON eumel
   FOR EACH ROW EXECUTE PROCEDURE eumel_trigger();

CREATE TRIGGER eumel_truncate_trig
   AFTER TRUNCATE ON eumel
   FOR EACH STATEMENT EXECUTE PROCEDURE eumel_trigger();

INSERT INTO sum_eumel
SELECT eumel_category,
       sum(eumel_data) AS eumel_sum
FROM eumel
GROUP BY eumel_category;

COMMIT;