I have a rahter long subquery which I store as a view
CREATE VIEW scans_minmax AS
SELECT tmp.source,
value ->> 'Ok' AS "Ok",
value ->> 'NotOk' AS "NotOk",
value ->> 'TotalCount' AS "TotalCount",
(CASE WHEN scans.timestamp = tmp.min_t THEN 'MIN' ELSE 'MAX' END) AS "Aggregation"
FROM (SELECT source, MIN(timestamp) AS min_t, MAX(timestamp) AS max_t
FROM scans
WHERE timestamp > NOW() - INTERVAL '30 days'
AND value ->> 'Ok' IS NOT NULL
AND value ->> 'NotOk' IS NOT NULL
AND value ->> 'TotalCount' IS NOT NULL
GROUP BY source) AS tmp,
scans
WHERE scans.source = tmp.source
AND (scans.timestamp = tmp.min_t OR scans.timestamp = tmp.max_t)
This gives me a nice table as result
| source | Ok | NotOk | TotalCount | Aggregation |
--------------------------------------------------
| A | a | b | c | MIN |
| A | x | y | z | MAX |
But as I wanted the table also "transposed" (MIN / MAX as columns next to each other for each value) I created the query
SELECT min_tbl.source,
min_tbl."Ok" AS "min_ok", max_tbl."Ok" AS "max_ok",
min_tbl."NotOk" AS "min_not_ok", max_tbl."NotOk" AS "max_not_ok",
min_tbl."TotalCount" AS "min_total_count", max_tbl."TotalCount" AS "max_total_count"
FROM scans_minmax AS min_tbl JOIN
scans_minmax AS max_tbl ON min_tbl.source = max_tbl.source
WHERE min_tbl."Aggregation" = 'MIN' AND max_tbl."Aggregation" = 'MAX'
which does exactly the right thing.
Now, my issue is the following:
The View above has 16 entries (there are 8 different entries for source in my DB). So the second query should take roughly the same time as a simple SELECT * FROM scans_minmax
.
But it turns out that it takes more time, as the optimizer does not perform the expaneded view query once and simply manipulate this temporary table but expands the view two times and thus does the expensive aggregation two times.
EXPLAIN ANALYZE on the Query above gives
Nested Loop (cost=173527.09..173543.19 rows=1 width=231) (actual time=2594.697..2594.852 rows=8 loops=1)
-> Merge Join (cost=173518.21..173518.28 rows=1 width=617) (actual time=2594.580..2594.610 rows=8 loops=1)
Merge Cond: ((scans_1.source)::text = (scans_3.source)::text)
-> Sort (cost=86883.50..86883.50 rows=1 width=562) (actual time=1292.558..1292.561 rows=8 loops=1)
Sort Key: scans_2.source
Sort Method: quicksort Memory: 31kB
-> Nested Loop (cost=86643.23..86883.49 rows=1 width=562) (actual time=1292.406..1292.519 rows=8 loops=1)
-> HashAggregate (cost=86634.35..86634.45 rows=10 width=55) (actual time=1292.281..1292.285 rows=8 loops=1)
Group Key: scans_2.source
-> Index Scan using scans_timestamp_index on scans scans_2 (cost=0.43..83024.66 rows=481292 width=47) (actual time=0.174..1036.297 rows=545244 loops=1)
Index Cond: ("timestamp" > (now() - '30 days'::interval))
Filter: (((value ->> 'Ok'::text) IS NOT NULL) AND ((value ->> 'NotOk'::text) IS NOT NULL) AND ((value ->> 'TotalCount'::text) IS NOT NULL))
Rows Removed by Filter: 98
-> Bitmap Heap Scan on scans scans_1 (cost=8.88..24.88 rows=1 width=531) (actual time=0.025..0.026 rows=1 loops=8)
Recheck Cond: (("timestamp" = (min(scans_2."timestamp"))) OR ("timestamp" = (max(scans_2."timestamp"))))
Filter: (((scans_2.source)::text = (source)::text) AND (CASE WHEN ("timestamp" = (min(scans_2."timestamp"))) THEN 'MIN'::text ELSE 'MAX'::text END = 'MAX'::text))
Rows Removed by Filter: 1
Heap Blocks: exact=16
-> BitmapOr (cost=8.88..8.88 rows=4 width=0) (actual time=0.018..0.018 rows=0 loops=8)
-> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.010..0.010 rows=1 loops=8)
Index Cond: ("timestamp" = (min(scans_2."timestamp")))
-> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.006..0.006 rows=1 loops=8)
Index Cond: ("timestamp" = (max(scans_2."timestamp")))
-> Sort (cost=86634.71..86634.74 rows=10 width=55) (actual time=1302.016..1302.016 rows=8 loops=1)
Sort Key: scans_3.source
Sort Method: quicksort Memory: 26kB
-> HashAggregate (cost=86634.35..86634.45 rows=10 width=55) (actual time=1301.967..1301.969 rows=8 loops=1)
Group Key: scans_3.source
-> Index Scan using scans_timestamp_index on scans scans_3 (cost=0.43..83024.66 rows=481292 width=47) (actual time=0.080..1042.757 rows=545244 loops=1)
Index Cond: ("timestamp" > (now() - '30 days'::interval))
Filter: (((value ->> 'Ok'::text) IS NOT NULL) AND ((value ->> 'NotOk'::text) IS NOT NULL) AND ((value ->> 'TotalCount'::text) IS NOT NULL))
Rows Removed by Filter: 98
-> Bitmap Heap Scan on scans (cost=8.88..24.88 rows=1 width=531) (actual time=0.023..0.024 rows=1 loops=8)
Recheck Cond: (("timestamp" = (min(scans_3."timestamp"))) OR ("timestamp" = (max(scans_3."timestamp"))))
Filter: (((scans_3.source)::text = (source)::text) AND (CASE WHEN ("timestamp" = (min(scans_3."timestamp"))) THEN 'MIN'::text ELSE 'MAX'::text END = 'MIN'::text))
Rows Removed by Filter: 1
Heap Blocks: exact=16
-> BitmapOr (cost=8.88..8.88 rows=4 width=0) (actual time=0.016..0.016 rows=0 loops=8)
-> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.009..0.009 rows=1 loops=8)
Index Cond: ("timestamp" = (min(scans_3."timestamp")))
-> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.006..0.006 rows=1 loops=8)
Index Cond: ("timestamp" = (max(scans_3."timestamp")))
Planning time: 2.253 ms
Execution time: 2594.994 ms
Why is that so and are there ways to avoid this behavior?
All I want the optimizer to do is to evaluate the VIEW as temporary table and do the join in my second query on this temporary table.
I am using PostgreSQL 9.6.15.
Thanks already!
Best Answer
This is expected behaviour: the view source is simply substituted for its name by the query rewrite, then analysed and executed as usual.
To tell the query compiler that you want to process the view only once you'd use a common table expression (CTE):
Postgres will materialise results of the CTE (prior to Postgres 12 anyway), causing your aggregation to execute only once.
Note that in general you need to exercise caution when using CTEs: