PostgreSQL – Computation of Non-Selected Calculated Columns in View

postgresqlview

I'm trying to understand the performance impact of selecting data from a view, where one of the columns in a view is function of other data in the original table.

Does the computation get performed irrespective of whether or not the computed column is in the list of selected columns?

If I had a table and the view declared like so

CREATE TABLE price_data (
    ticker     text,          -- Ticker of the stock
    ddate      date,          -- Date for this price
    price      float8,        -- Closing price on this date
    factor     float8         -- Factor to convert this price to USD
);

CREATE VIEW prices AS 
    SELECT ticker, 
           ddate,
           price,
           factor,
           price * factor as price_usd
    FROM price_data

Would that multiplication be performed in a query like the one below?

select ticker, ddate, price, factor from prices

Is there a reference that guarantees this one way or the other? I was reading the documentation on the rule system in Postgres, but I think the answer really lies with the optimiser, since nothing in the rule system documentation indicated that it wouldn't be selected.

I suspect in the above case the computation isn't performed. I changed the view to use division instead of multiplication, and inserted a 0 for factor into price_data. The query above didn't fail, but if the query was modified to select the computed column the modified query failed.

Is there any way to understand what computations are being done when a select is carried out? I guess I'm looking for something like EXPLAIN but which also tells me about the computation(s) that are being performed.

Best Answer

As @Laurenz said, your analysis is correct: the optimiser will avoid evaluating column expressions which do not affect the outcome of the query (and your attempt to force a divide-by-zero error is proof of this).

This depends on which columns you're selecting, but it also depends on the volatility category of the column expressions. The optimiser is free to omit immutable and stable function calls if their output is never used, as they cannot affect the outcome, but volatile functions might have side-effects, so they are not so readily optimised away.

For example:

create function stable_function() returns int as $$
begin
  raise notice 'stable_function() called';
  return 1;
end
$$
language plpgsql stable;

create function volatile_function() returns int as $$
begin
  raise notice 'volatile_function() called';
  return 1;
end
$$
language plpgsql volatile;

create view v as
  select stable_function(), volatile_function();

If only the volatile column is selected:

test=# explain (analyse, verbose) select volatile_function from v;
NOTICE:  volatile_function() called
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Subquery Scan on v  (cost=0.00..0.27 rows=1 width=4) (actual time=0.057..0.057 rows=1 loops=1)
   Output: v.volatile_function
   ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual time=0.056..0.056 rows=1 loops=1)
         Output: NULL::integer, volatile_function()

...then as you can see, stable_function() is absent from the explain output, and the lack of a NOTICE confirms that this call has been optimised away.

However, if the stable column is selected instead:

test=# explain (analyse, verbose) select stable_function from v;
NOTICE:  stable_function() called
NOTICE:  volatile_function() called
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Subquery Scan on v  (cost=0.00..0.52 rows=1 width=4) (actual time=0.139..0.139 rows=1 loops=1)
   Output: v.stable_function
   ->  Result  (cost=0.00..0.51 rows=1 width=8) (actual time=0.138..0.138 rows=1 loops=1)
         Output: stable_function(), volatile_function()

...then we see both column expressions appearing in the plan, and the NOTICEs show that both functions have been executed.

There doesn't seem to be any explicit mention of this behaviour in the docs, so there are no hard guarantees as to whether or not an expression will be evaluated, and you should not rely on any side-effects your function calls might have.

But if your only concern is performance, then as long as you mark your functions as stable or immutable where appropriate, you can be reasonably sure (especially in simple cases like this) that they will not be evaluated unless they're needed.

(And while you're there auditing your volatility declarations, you might want to set the parallel safety flags as well.)