PostgreSQL – Computation of Non-Selected Calculated Columns in View

postgresqlview

I'm trying to understand the performance impact of selecting data from a view, where one of the columns in a view is function of other data in the original table.

Does the computation get performed irrespective of whether or not the computed column is in the list of selected columns?

If I had a table and the view declared like so

CREATE TABLE price_data (
    ticker     text,          -- Ticker of the stock
    ddate      date,          -- Date for this price
    price      float8,        -- Closing price on this date
    factor     float8         -- Factor to convert this price to USD
);

CREATE VIEW prices AS 
    SELECT ticker, 
           ddate,
           price,
           factor,
           price * factor as price_usd
    FROM price_data

Would that multiplication be performed in a query like the one below?

select ticker, ddate, price, factor from prices

Is there a reference that guarantees this one way or the other? I was reading the documentation on the rule system in Postgres, but I think the answer really lies with the optimiser, since nothing in the rule system documentation indicated that it wouldn't be selected.

I suspect in the above case the computation isn't performed. I changed the view to use division instead of multiplication, and inserted a 0 for factor into price_data. The query above didn't fail, but if the query was modified to select the computed column the modified query failed.

Is there any way to understand what computations are being done when a select is carried out? I guess I'm looking for something like EXPLAIN but which also tells me about the computation(s) that are being performed.

Best Answer

As @Laurenz said, your analysis is correct: the optimiser will avoid evaluating column expressions which do not affect the outcome of the query (and your attempt to force a divide-by-zero error is proof of this).

This depends on which columns you're selecting, but it also depends on the volatility category of the column expressions. The optimiser is free to omit immutable and stable function calls if their output is never used, as they cannot affect the outcome, but volatile functions might have side-effects, so they are not so readily optimised away.

For example:

create function stable_function() returns int as $$
begin
  raise notice 'stable_function() called';
  return 1;
end
$$
language plpgsql stable;

create function volatile_function() returns int as $$
begin
  raise notice 'volatile_function() called';
  return 1;
end
$$
language plpgsql volatile;

create view v as
  select stable_function(), volatile_function();

If only the volatile column is selected:

test=# explain (analyse, verbose) select volatile_function from v;
NOTICE:  volatile_function() called
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Subquery Scan on v  (cost=0.00..0.27 rows=1 width=4) (actual time=0.057..0.057 rows=1 loops=1)
   Output: v.volatile_function
   ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual time=0.056..0.056 rows=1 loops=1)
         Output: NULL::integer, volatile_function()

...then as you can see, stable_function() is absent from the explain output, and the lack of a NOTICE confirms that this call has been optimised away.

However, if the stable column is selected instead:

test=# explain (analyse, verbose) select stable_function from v;
NOTICE:  stable_function() called
NOTICE:  volatile_function() called
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Subquery Scan on v  (cost=0.00..0.52 rows=1 width=4) (actual time=0.139..0.139 rows=1 loops=1)
   Output: v.stable_function
   ->  Result  (cost=0.00..0.51 rows=1 width=8) (actual time=0.138..0.138 rows=1 loops=1)
         Output: stable_function(), volatile_function()

...then we see both column expressions appearing in the plan, and the NOTICEs show that both functions have been executed.

There doesn't seem to be any explicit mention of this behaviour in the docs, so there are no hard guarantees as to whether or not an expression will be evaluated, and you should not rely on any side-effects your function calls might have.

But if your only concern is performance, then as long as you mark your functions as stable or immutable where appropriate, you can be reasonably sure (especially in simple cases like this) that they will not be evaluated unless they're needed.

(And while you're there auditing your volatility declarations, you might want to set the parallel safety flags as well.)

Related Solutions

Db2 – About creating a View With additional coumns (that are not in the table)

Can you not just create a new view that is

SELECT *, 0 AS "ABC"
FROM VIEW_IRS1

Or if you mean the text 'False'

SELECT *, 'FALSE' AS "ABC"
FROM VIEW_IRS1

Or to edit the existing view:

SELECT 
    MBR.PRSN_MBRSH_ID
    , Atr.Ssn_Tx
    , Names.First_Na
    , Names.Middle_Na
    , Names.Last_Na
    , Reff.Ref_Cd_Desc_Tx Suffix_Na
    , 'FALSE' AS "My Really Long Name"
FROM 
    AllYourOtherTables

Sql-server – Selecting from a view shows the wrong set of columns

First, stop using SELECT * in your views. I talk about this quite a bit here:

Bad habits to kick : using SELECT * / omitting the column list

Next, run sp_refreshview or sp_refreshsqlmodule against each view that references a table (or another view!) that you have changed, e.g.:

EXEC sp_refreshview N'dbo.viewname';

If you want to generate a script that refreshes all of the views in the system:

DECLARE @sql NVARCHAR(MAX) = N'';

SELECT @sql += N'
  EXEC sp_refreshsqlmodule N''' 
  + QUOTENAME(s.name) + '.' + QUOTENAME(v.name) + ''';'
FROM sys.schemas AS s 
INNER JOIN sys.views AS v 
ON s.[schema_id] = v.[schema_id]; 

EXEC sp_executesql @sql;

(You may want to run it twice in case there are circular or out-of-order dependencies.)

But this is a temporary fix. You will have to repeat this every time you change any table that might have views that reference it, then go find out which views are involved. This is not the right way to approach schema design. AT ALL. I wrote about a DDL trigger that would help automate this for you, but this is really the wrong approach - just stop using SELECT * in your views!

Keeping sysdepends up to date in SQL Server 2008

You may also want to consider adding WITH SCHEMABINDING to the definition of all of your views. This does two things:

Prevents you from using SELECT * in the view. This is A GOOD THING™.
Prevents you from changing underlying tables without knowing exactly which views would be affected by the change. This is also A GOOD THING™.

Best Answer

Related Solutions

Db2 – About creating a View With additional coumns (that are not in the table)

Sql-server – Selecting from a view shows the wrong set of columns

Related Question