PostgreSQL Cache Misses – How to Identify Queries Causing Them

cachepostgresql

I have a Postgres database with a total data size of 115GB. The server has ~60GB of memory. The index cache hit rate is holding at 99%+ but the table cache hit rate has fallen to ~97%.

I am trying to identify if there are particular queries or access patterns we are making that are contributing to the drop. We may be able to optimize the app if so.

I have used the below query to identify tables that have a low hit rate…

SELECT relname,
  CASE (sum(heap_blks_hit) + sum(heap_blks_read))
    WHEN 0 THEN 1
    ELSE sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read))
  END as hitrate, 
  pg_size_pretty(sum(heap_blks_hit) + sum(heap_blks_read)) AS total_read,
  pg_size_pretty(sum(heap_blks_read)) AS total_miss
  FROM pg_statio_user_tables
  GROUP BY relname
  ORDER BY hitrate

I am not sure where to go from here though. Is there a way to track if certain queries are commonly producing misses for the tables I know are low?

Best Answer

The pg_stat_statements extension does exactly what you want, giving the block hits and misses for each statement.

However, I usually don't find this information all that useful. Many of the block misses are actually served by the file system cache and not actually read from disk. PostgreSQL provides no direct way to discern these types of misses.

I think the best thing to do would be to turn on track_io_timing and then look at the time spent servicing those misses, rather than the raw number of misses.

Related Solutions

Postgresql – How to calculate cache misses for PostgreSQL

I use this query to show disk x cache hits:

-- perform a "select pg_stat_reset();" when you want to reset counter statistics
with 
all_tables as
(
SELECT  *
FROM    (
    SELECT  'all'::text as table_name, 
        sum( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, 
        sum( (coalesce(heap_blks_hit,0)  + coalesce(idx_blks_hit,0)  + coalesce(toast_blks_hit,0)  + coalesce(tidx_blks_hit,0))  ) as from_cache    
    FROM    pg_statio_all_tables  --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables)
    ) a
WHERE   (from_disk + from_cache) > 0 -- discard tables without hits
),
tables as 
(
SELECT  *
FROM    (
    SELECT  relname as table_name, 
        ( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, 
        ( (coalesce(heap_blks_hit,0)  + coalesce(idx_blks_hit,0)  + coalesce(toast_blks_hit,0)  + coalesce(tidx_blks_hit,0))  ) as from_cache    
    FROM    pg_statio_all_tables --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables)
    ) a
WHERE   (from_disk + from_cache) > 0 -- discard tables without hits
)
SELECT  table_name as "table name",
    from_disk as "disk hits",
    round((from_disk::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% disk hits",
    round((from_cache::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% cache hits",
    (from_disk + from_cache) as "total hits"
FROM    (SELECT * FROM all_tables UNION ALL SELECT * FROM tables) a
ORDER   BY (case when table_name = 'all' then 0 else 1 end), from_disk desc

Sql-server – How bad is query plan pollution (too many equivalent plans in cache)

"How bad is it?" depends on the degree to which you are suffering now or could suffer with increased workload in the future.

One major point of suffering with plan cache pollution could be too many single use plans bloating your plan cache leading to inefficient cache usage.

Another point of suffering could be high compilations/second - so in an environment with a heavy workload and a lot of activity, there is a cost associated with compiling over and over.

You can see the impact of compilations/sec in perfmon (SQL Server Statistics:Compilations/sec). This can look like CPU pressure. To your performance/applications, this can look like increased query duration waiting for needless compiles each time it runs.

You can see the impact to the plan cache from the memory bloat by this query borrowed from Glenn Berry's Diagnostic scripts. How big is your SQLCP plan cache?

SELECT TOP(10) [type] AS [Memory Clerk Type], 
       SUM(pages_kb)/1024 AS [Memory Usage (MB)] 
FROM sys.dm_os_memory_clerks WITH (NOLOCK)
GROUP BY [type]  
ORDER BY SUM(pages_kb) DESC OPTION (RECOMPILE);

Also the query that was used in the question to identify the number of plans helps as well.

Is This Ever a Good Thing?

There are some cases where this could be good, but the situation is rare. Basically if you were suffering from parameter sniffing gone bad (nutshell: if the data can vary widely from execution to execution based on parameters, one compilation for one set of parameters ideal may yield an excellent query plan for that one query but poor for others.). My guess is that you likely wouldn't be dealing with that as bad as the implications from poor plan reuse.

What Can You Do About It?

Optimize For Ad Hoc Workloads can certainly help with the memory implications since only a stub of the plan is stored in cache at first execution, and the full plan isn't stored until it is executed a second time with the same plan.
Forced Parameterization could help here also. It can sometimes force parameterization to happen and help solve both the issue of cache bloat and the cost of having to recompile.
Fix The Queries Ideally, you shouldn't have to resort to these options, but instead can be more strict in your database development, encourage plan reuse, consider stored procedures for all of their benefits, and attempt to head off the problem that way. The ways to help fix this through forced parameterization or optimize for ad hoc are good to help, but the best solution is always aimed at the root cause.

There is an excellent resource here that talks about some of the dangers of plan cache pollution and some things you can do. I'd recommend a read here. It is written for SQL Server 2012, but the concepts and solutions apply.

Best Answer

Related Solutions

Postgresql – How to calculate cache misses for PostgreSQL

Sql-server – How bad is query plan pollution (too many equivalent plans in cache)

Related Question