Postgresql – Understand PostgreSQL’s statistics about rows and times

postgresqlstatistics

I'm trying to find out how much database load an aplication causes. Therefore I'm looking at the statistics of a single database (there's only a single user working with it). The numbers I get are partially redundant and massively different. Here's the query I use to get the data:

with pss as (
    select pd.datname,
        sum(calls) as query_count,   -- Number of executed queries   
        sum(total_time)::integer as time,   -- Total time spent for queries, in milliseconds
        sum(rows) as rows,   -- Number of rows retrieved or affected by queries
        sum(blk_read_time)::integer as read_time,   -- Total time queries spent reading blocks, in milliseconds
        sum(blk_write_time)::integer as write_time   -- Total time queries spent writing blocks, in milliseconds
    from pg_stat_statements as pss
        left join pg_database as pd on (pd.oid = pss.dbid)
    group by pd.datname
)
select psd.numbackends,   -- Currently connected clients
    pss.query_count,
    psd.xact_commit + psd.xact_rollback as xact_count,   -- Transaction count
    psd.blks_read,   -- Disk blocks read
    psd.blks_hit,   -- Disk block found in buffer cache
    pss.rows,
    psd.tup_returned,   -- Rows returned by queries
    psd.tup_fetched,   -- Rows fetched by queries
    psd.tup_inserted + psd.tup_updated + psd.tup_deleted as tup_written,   -- Rows written by queries
    psd.blk_read_time::integer,   -- Time spent reading data file blocks by backends in this database, in milliseconds
    psd.blk_write_time::integer,   -- Time spent writing data file blocks by backends in this database, in milliseconds
    pss.time,
    pss.read_time,
    pss.write_time
from pg_stat_database as psd
    join pss on (pss.datname = psd.datname) 
where psd.datname = 'myappdb';

I have loaded and installed the pg_stat_statements extension and enabled the track_io_timing option. I've also reset both statistics at the same time a while ago as a superuser in this database:

select pg_stat_reset(), pg_stat_statements_reset();

The result currently looks like this:

numbackends:            28
query_count:       270,302
xact_count:        270,313
blks_read:      17,666,658
blks_hit:        5,063,072
rows:              462,105
tup_returned:   24,494,192
tup_fetched: 1,131,085,600
tup_written:        17,509
blk_read_time:      39,337
blk_write_time:      9,257
time:              229,049
read_time:          50,303
write_time:          9,261

My questions are these:

  1. Why are the numbers in rows, tup_returned and tup_fetched so different? At least rows and tup_returned should be the same, right?
  2. Why are blk_read_time and read_time a bit off while blk_write_time and write_time are so close?

Best Answer

The "rows" field only reports rows returned by the query in question. The pg_stat_database fields also count rows that had to be fetched from system catalogs in order to plan and execute your queries. pg_stat_statements doesn't track these, not even with pg_stat_statements.track=all. Since the catalog data is generally cached within a session, this can give you massive expansion in the number in pg_stat_database relative to pg_stat_statements if you only execute one small query in each session and then disconnect. These catalog reads can also explain the difference in read times.

There is more to the story than that, like whether they count input rows or output rows for aggregations and joins. Frankly I find tup_returned and tup_fetched to be useless.

What problem are you trying to solve?