Postgresql – Should I place calculation within immutable function

postgresqlpostgresql-performance

I have a pl/pgsql function which does some operations, and finally returns a calculated value, something like:

select MyMainFunc(col1 , col2) from my_table;

where (col1 , col2) is primary key (so they are unique pairs of values). Value returned from MyMainFunc is something like:

return round(  (2.0*param1)  / (2*param1+ param2+ param3) ,10 );    

param1 , param2 and param3 may be repeated in successive calls to MyMaiFunc.

I read that an IMMUTABLE function returns the same result for the same set of arguments without recalculating the result each time the function is called

Based on that: would creating an IMMUTABLE function for this single line of code, and calling that function within this context yield better performance (I mean by saving time needed for calculation)?

return MyInnerFunc(param1,param2,param3);

I am using PostgreSQL 11, and this is the complete main function in case anyone needs details. It calculates similarity between two topics in an ontology using Wu&Palmer algorithm

CREATE OR REPLACE FUNCTION public."WuPalmerSimilarity"(
    p_topic_code1 integer,
    p_topic_code2 integer)
    RETURNS numeric
    LANGUAGE 'plpgsql'

    COST 100
    
AS $BODY$

declare 

    v_paths1 json;
    v_paths2 json;

    v_paths_count1 integer;
    v_paths_count2 integer;

    v_index1 integer;
    v_index2 integer;

    v_lca_code integer;
    v_lca_level integer;
    v_temp_level integer;   
    v_topic1_lca_path_length integer;
    v_topic2_lca_path_length integer;   
    v_temp_path1_length integer;
    v_temp_path2_length integer;

    v_ontology_depth integer;

    v_lca record;


BEGIN

    if(p_topic_code1 = p_topic_code2) then return 1.0; end if;

    -- get topic 1 paths
    select path_agg into v_paths1 
    from cso_agg_path where topic_code = p_topic_code1;

    -- -- get topic 2 paths
    select path_agg into v_paths2 
    from cso_agg_path where topic_code = p_topic_code2;

    -- count paths
    v_paths_count1 := json_array_length(v_paths1);
    v_paths_count2 := json_array_length(v_paths2);

    -- initialization
    select max(topic_level) into v_ontology_depth from cso_agg_path;

    v_lca_code := 0;
    v_lca_level := 0;
    v_topic1_lca_path_length:= v_ontology_depth;
    v_topic2_lca_path_length:= v_ontology_depth;

    -- loop over topic 1 paths
    for v_index1 in 0..   v_paths_count1-1 loop                 

        -- loop over topic 2 paths                                             
        for v_index2 in 0..   v_paths_count2-1 loop 

            v_lca := public."LCA"(v_paths1::json->v_index1 , v_paths2::json->v_index2 , 0,null,null) ;

            v_temp_path1_length := v_lca.p_topic1_lca_path_length;
            v_temp_path2_length := v_lca.p_topic2_lca_path_length;

            if(v_lca.p_lca_code <> v_lca_code) then

                select topic_level into v_temp_level
                from cso_agg_path where topic_code = v_lca.p_lca_code;

                if( v_temp_level > v_lca_level or 
                    (v_temp_level = v_lca_level and 
                        ( (v_temp_path1_length + v_temp_path2_length) < (v_topic1_lca_path_length + v_topic2_lca_path_length) ) 
                    ) 
                  ) then 
                    v_lca_code := v_lca.p_lca_code;
                    v_lca_level := v_temp_level;    
                    v_topic1_lca_path_length := v_temp_path1_length;                
                    v_topic2_lca_path_length := v_temp_path2_length;                                                                                                                                                                                                 
                end if;

            else
                if(v_temp_path1_length < v_topic1_lca_path_length) then 
                    v_topic1_lca_path_length := v_temp_path1_length;  end if;                                                                                
                if(v_temp_path2_length < v_topic2_lca_path_length) then
                    v_topic2_lca_path_length := v_temp_path2_length;  end if;                                                                                                                                                            
            end if;


        end loop;

    end loop;

                                                                                                                                                                                                                                                                                                                                                                                                
   -- ***** THIS IS THE LINE I AM ASKING ABOUT ************
    return round(  (2.0*v_lca_level)  / (2*v_lca_level +v_topic1_lca_path_length+v_topic2_lca_path_length ) ,10 );  

    -- MY OTHER CHOICE IS USING THIS IMMUTABLE FUNCTION :

    --return public."WuPalmer"(v_lca_level , v_topic1_lca_path_length , v_topic2_lca_path_length );
     
   -- ************************************* --

    end; 

$BODY$;

Given that:

CREATE OR REPLACE FUNCTION public."WuPalmer"(
p_lca_level integer,
p_topic1_lca_path_length integer,
p_topic2_lca_path_length integer)

RETURNS numeric
LANGUAGE 'plpgsql'

COST 100
IMMUTABLE 

AS $BODY$

 BEGIN                                                                                                                                                                                                                                                                                   

 return round(  (2.0*p_lca_level)  / (2*p_lca_level +p_topic1_lca_path_length+p_topic2_lca_path_length ) ,10 );                                                                                                                                                                       
                                                                                                 
 END; 

$BODY$;

Best Answer

You should declare the function STABLE, that means that it is guaranteed to return the same values for the same argument during the execution of a single SQL statement.

That is because the function selects from the database, so its values can change from one SQL statement to the next, but every invocation within the same statement will return the same results, because they share the same snapshot of the database.

For performance, STABLE is almost as good as IMMUTABLE, except that the function cannot be evaluated a query planning time, but at execution time (which might make a difference for prepared statements).

See this example:

PREPARE stmt AS
SELECT id FROM atable WHERE col1 = myfunc(1, 42);

EXECUTE stmt;
EXECUTE stmt;
EXECUTE stmt;

Let's assume atable has 100000 rows.

  • if myfunc is IMMUTABLE, it will be executed only once, when the prepared statement is planned

  • if myfunc is STABLE, it will be executed three times, once for each execution of the prepared statement

  • if myfunc is VOLATILE, it will be executed 300000 times, once for each row of atable that is scanned