Postgresql – nything wrong with splitting a complicated view into many small ones

execution-planpostgresqlpostgresql-performanceview

I have a view where each row is associated with a customer, and the columns are various computed values such as life_time_value and purchases_per_week, as well as more complicated statistical values such as probability_of_buying_premium_membership. I have around 20 such columns of varying complexity (both in terms of lines-of-code and also computational complexity), ranging from a single line of SQL to several dozen. Right now they are all in one monster view.

Is there a down-side to splitting them into multiple smaller views and joining them by customer_id?

Ie, break it down into views called
customer_life_time_value, customer_purchases_per_week and so on, and then recreate the monster view by joining 20 views? It seems like there shouldn't be a performance hit because of the joining, as it's over a indexed primary key. Many of the columns/views will perform similar calculations (purchases_per_week and purchases_per_quarter would look very similar), but it seems like the DB should be smart enough to share computation if I am selecting from the joined view.

I am using Postgres, but interested for answers in general.

Best Answer

Is there a down-side to splitting them into multiple smaller views and joining them by customer_id?

Yes, definitely. Each view has to scan the whole underlying table on its own, and then you add 20 joins after that. The index does not apply to the derived tables you are joining. The single SELECT can make do with a single scan over the table (or index), it should be substantially cheaper.

Proof: db<>fiddle here

Related Solutions

Improving Performance of Nested-View Joins in PostgreSQL

I am speculating here, but I guess the fact that you LEFT JOIN to the view makes the planner compute the results from the view as a whole before joining to the first part of the query.

Try inlining the query from the view and make it a JOIN instead of a LEFT JOIN, just to see if the planner finds a faster way now:

SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
     , p.data_id AS p_data_id, p.x2, c.str AS impression, i.h
     , p.x3, p.x3, p.x4
     , s.x5, s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14
     , t.x15, t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23
     , i.data AS input
     , s1.data AS stage1, s2.data AS stage2, s3.data AS stage3
FROM   input_data d
JOIN   category1_results        p ON p.data_id = d.data_id
JOIN   input_file               i USING (data_id)
JOIN   stage1_output_file      s1 USING (data_id)
JOIN   stage2_output_file      s2 USING (data_id)
JOIN   stage3_output_file      s3 USING (data_id)
JOIN   category2_results        s USING (data_id)
JOIN   category3_results        t USING (data_id)
JOIN   category1_impression_str c ON p.impression = c.id 
LEFT   JOIN quality_codes       t ON t.id = d.input_quality
WHERE  NOT d.deleted;

I cleaned up your syntax to make it more manageable. Added an alias for the second data_id column, so it can execute.

If that should lead to a considerably faster execution time, you can try and add missing rows due to the INNER JOIN like this:

SELECT DISTINCT ON (1,2,3,4,5,6,7,8) *
FROM (
    <<query>>
    ) x
UNION ALL
SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
      ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
      ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
      ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
FROM   input_data d
LEFT   JOIN quality_codes t ON t.id = d.input_quality
WHERE  NOT d.deleted;
ORDER  BY 1,2,3,4,5,6,7,8, 9 NULLS LAST; -- p.data_id is otherwise not null

Sql-server – Can indexing views in SQL Server 2005+ speed up SELECT count(*) from a View

Typically you index views if you are often running aggregates, not to magically speed up joins. Also if you are not using table partitioning there is no good reason to investigate partition-aligned indexed views.

You should be focusing on optimizing the query, irrespective of the view, IMHO. An indexed view is not a magic turbo button (though a lot of people seem to think it is).

Also consider whether you should just run a query separately from the view anyway - often people run a lot of queries through the same view for convenience, even if the view actually touches more columns / rows than are necessary for the specific query.

Best Answer

Related Solutions

Improving Performance of Nested-View Joins in PostgreSQL

Sql-server – Can indexing views in SQL Server 2005+ speed up SELECT count(*) from a View

Related Question