I'm trying to understand the performance impact of selecting data from a view, where one of the columns in a view is function of other data in the original table.
Does the computation get performed irrespective of whether or not the computed column is in the list of selected columns?
If I had a table and the view declared like so
CREATE TABLE price_data (
ticker text, -- Ticker of the stock
ddate date, -- Date for this price
price float8, -- Closing price on this date
factor float8 -- Factor to convert this price to USD
);
CREATE VIEW prices AS
SELECT ticker,
ddate,
price,
factor,
price * factor as price_usd
FROM price_data
Would that multiplication be performed in a query like the one below?
select ticker, ddate, price, factor from prices
Is there a reference that guarantees this one way or the other? I was reading the documentation on the rule system in Postgres, but I think the answer really lies with the optimiser, since nothing in the rule system documentation indicated that it wouldn't be selected.
I suspect in the above case the computation isn't performed. I changed the view to use division instead of multiplication, and inserted a 0
for factor
into price_data
. The query above didn't fail, but if the query was modified to select the computed column the modified query failed.
Is there any way to understand what computations are being done when a select
is carried out? I guess I'm looking for something like EXPLAIN
but which also tells me about the computation(s) that are being performed.
Best Answer
As @Laurenz said, your analysis is correct: the optimiser will avoid evaluating column expressions which do not affect the outcome of the query (and your attempt to force a divide-by-zero error is proof of this).
This depends on which columns you're selecting, but it also depends on the volatility category of the column expressions. The optimiser is free to omit
immutable
andstable
function calls if their output is never used, as they cannot affect the outcome, butvolatile
functions might have side-effects, so they are not so readily optimised away.For example:
If only the
volatile
column is selected:...then as you can see,
stable_function()
is absent from theexplain
output, and the lack of aNOTICE
confirms that this call has been optimised away.However, if the
stable
column is selected instead:...then we see both column expressions appearing in the plan, and the
NOTICE
s show that both functions have been executed.There doesn't seem to be any explicit mention of this behaviour in the docs, so there are no hard guarantees as to whether or not an expression will be evaluated, and you should not rely on any side-effects your function calls might have.
But if your only concern is performance, then as long as you mark your functions as
stable
orimmutable
where appropriate, you can be reasonably sure (especially in simple cases like this) that they will not be evaluated unless they're needed.(And while you're there auditing your volatility declarations, you might want to set the parallel safety flags as well.)