Postgresql – Need help understanding Postgres materialized views under the hood

materialized-viewpostgresqlpostgresql-performance

postgres 9.6.20

Need some help getting an accurate mental model of whats happening with postgres materialized views.

Based on this from postgres docs (https://www.postgresql.org/docs/9.6/rules-materializedviews.html) "The information about a materialized view in the PostgreSQL system catalogs is exactly the same as it is for a table or view. So for the parser, a materialized view is a relation, just like a table or a view. When a materialized view is referenced in a query, the data is returned directly from the materialized view, like from a table; the rule is only used for populating the materialized view." (emphasis mine)

My assumptions based on the above (and what else I've been reading):

  • that making a mat view will then allow me to query against that new table
  • The query that generates that view is only run on the REFRESH call
  • That if the result of the query used to generate the mat view returns ~30000 rows, then that is how big the materialized view table will be

Something in my assumptions is wrong, and I'm trying to understand what.

I have a nice big (to me) query – some CTEs, some joins, some aggregating info into jsonb columns as my materialized view.

WITH
durations AS (SELECT DISTINCT ON(song) song, song_source_files.duration
    FROM song_source_files WHERE song_source_files.type = 'Full'
    ORDER BY song, updated_at DESC),
...<truncated more CTEs>
SELECT DISTINCT
    published_song.song,
    songs.bpm,
    ...<truncated>...
    COALESCE(durations.duration, 0) as duration,
    ...<truncated>
FROM
    published_song
    LEFT JOIN songs ON songs.id = published_song.song
    LEFT JOIN durations ON durations.song = published_song.song
    ...<truncated>...
    LEFT JOIN vocals ON vocals.song = published_song.song

CREATE UNIQUE INDEX IF NOT EXISTS "idx_{mv_name}_uuid" ON "{mv_name}" ("uuid");
...<truncated>

When I run the query, it takes about 30 seconds, and returns ~30000 rows.

In the mat view, when I query using a WHERE aimed at an indexed column, it's fast, as expected.

When I do EXPLAIN (ANALYZE, FORMAT JSON) SELECT count(*) FROM published_song_mv; this takes about 2.5 mins, and I see "Plan Rows": 12606570 and "Actual Rows": 29536.

So this is where I'm confused – my expectation is that the mat view only works with the Actual Rows. Not sure why the Plan Rows of 12 million even enters the equation (in the sense of, why it would apply on the final result set of ~30000 rows – i know i have some gnarly joins to generate the view)?

So I'm looking for help to correct the mental model I have in my head of whats happening under the hood.

Using Postgres 9.6 – can this be a factor – should I consider upgrading sooner rather than later?

Best Answer

All your assumptions are correct.

You need to ANALYZE the materialized view to get correct estimates – this should be done automatically by autovacuum, so I am not sure what could be wrong there.

It also looks like your materialized view is terribly bloated. You would need to run VACUUM (FULL) on it to fix that.

Both point in the direction of autovacuum not doing its job properly. You should investigate that.