PostgreSQL – How to Find Which Functions Are Used?

functionspostgresqlpostgresql-9.5statistics

I have a legacy PostgreSQL database schema with hundreds of functions. I know that some of them aren't used anymore, and I want to delete them.

What is the easiest way to find which procedures are executed sometimes and which aren't? Is there any kind of statistics about procedure usage in PostgreSQL?

Used engine version is PostgreSQL 9.5.

Best Answer

Yes, use the statistics collector. The manual:

PostgreSQL's statistics collector is a subsystem that supports collection and reporting of information about server activity. [...] It can also count calls to user-defined functions and the total time spent in each one.

[...]

The parameter track_functions enables tracking of usage of user-defined functions.

Bold emphasis mine.

track_functions (enum)
Enables tracking of function call counts and time used. Specify pl to track only procedural-language functions, all to also track SQL and C language functions. The default is none, which disables function statistics tracking. Only superusers can change this setting.

Note: SQL-language functions that are simple enough to be "inlined" into the calling query will not be tracked, regardless of this setting.

So this is off by default. If all relevant functions are LANGUAGE plpgsql, it's enough to set it to pl in postgresql.conf and reload. No need to restart the server for this setting.

track_functions = pl

Reload with pg_ctl reload. Or on a typical Debian(-related) installation:

sudo -u postgres pg_ctlcluster 12 main reload

Postgres starts to gather information. After an adequate period of time, check track records. Careful though, some important functions may not be called in weeks. Your call ...

How to access gathered statistics? The manual again:

When statistics collection is enabled, the data that is produced can be accessed via the pg_stat and pg_statio family of system views. Refer to Chapter 27 for more information.

Basically:

SELECT * FROM pg_stat_user_functions;

Or you can easily look it up with pgAdmin (or another GUI). In pgAdmin (3 or 4), select the function and look at the "Statistics" tab top right: "Number of Calls".

Aside:
There are no true "stored procedures" before Postgres 11, just functions. The term "procedure" is often used synonymously.

Related Solutions

Postgresql – How simplify/optimize a search of keywords in two tables at the same time

The default Postgresql configuration will be used.

The default PostgreSQL configuration is meant to be adequate for smallish servers. If you are worried about scalability, which you seem to be, then restricting yourself to using the default configuration is going to be a poor plan.

WHERE p.name ILIKE ANY(potential_parts)

You are probably going to want a trgm_ops index on p.name, and similarly on T.name

You might also need to add user-space code which prevents people from searching on things of less than 3 or 4 characters.

We can consider that each table will have more than a million of row.

Make a million rows of dummy data, and see what happens. No use speculating when you can just do it.

SQL Server – Identifying Resampled Partitions in Incremental Statistics

My first question would be to ask why you're actually using incremental in the first place. Here's an answer that I posted regarding incremental statistics, a blog post by Erin Stellato that illuminates one of the primary complaints and pitfalls with incremental statistics (they're not used at the partition level by the optimizer), and two blog posts by me that work through evaluating any potential use case for incremental statistics.

Having said that, to know when a partition's stats have been sampled you can use an undocumented DMF (sys.dm_db_stats_properties_internal()) to get partition-level information. I have a comment on this blog post that describes how to understand the hierarchy at a fairly high level.

select 
    sysdatetime(),                          
    schema_name = sh.name,
    table_name = t.name,
    stat_name = s.name,
    index_name = i.name,
    leading_column = index_col(quotename(sh.name)+'.'+quotename(t.name),s.stats_id,1),
    s.stats_id,
    parition_number = isnull(sp.partition_number,1),
    s.has_filter,                       
    s.is_incremental,
    s.auto_created,
    sp.last_updated,    
    sp.rows,
    sp.rows_sampled,                        
    sp.unfiltered_rows,
    modification_counter = coalesce(sp.modification_counter, n1.modification_counter) 
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sh
    on t.schema_id = sh.schema_id
left join sys.indexes i 
    on s.object_id = i.object_id
        and s.name = i.name
cross apply sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) sp
outer apply sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) n1
where n1.node_id = 1
    and (
            (is_incremental = 0)
               or
            (is_incremental = 1 and sp.partition_number is not null)
         )
    and t.name = '<<TABLENAME>>'
    and s.name like '<<STATNAME>>%'
order by s.stats_id,isnull(sp.partition_number,1);

Best Answer

Related Solutions

Postgresql – How simplify/optimize a search of keywords in two tables at the same time

SQL Server – Identifying Resampled Partitions in Incremental Statistics

Related Question