PostgreSQL Monitoring – Components of pg_database_size

monitoringpostgresqlstatistics

I'm trying to write a munin plugin to graph DB sizes. Alongside using pg_database_size I want to graph the components thereof as well.

So far, I've come up with the following:

SELECT
    SUM(pg_relation_size(oid, 'main')) AS main_size,
    SUM(pg_relation_size(oid, 'vm')) AS vm_size,
    SUM(pg_relation_size(oid, 'fsm')) AS fsm_size,
    SUM(
        CASE reltoastrelid
        WHEN 0 THEN 0
        ELSE pg_total_relation_size(reltoastrelid)
        END
    ) AS toast_size,
    SUM(pg_indexes_size(oid)) AS indexes_size
    FROM pg_class
    WHERE reltype != 0 -- 0=indices, covered by pg_indexes_size

However, summing up those 5 values returns me something which is not the same as the result of pg_database_size. The difference seems to be less significant for larger DBs.

Example on a larger DB:

┌──────────┬────────┬─────────┬─────────┬──────────┬───────────────┬──────────────────┬─────────┐
│   main   │   vm   │   fsm   │  toast  │ indexes  │ sum_of_values │ pg_database_size │  diff   │
├──────────┼────────┼─────────┼─────────┼──────────┼───────────────┼──────────────────┼─────────┤
│ 72441856 │ 753664 │ 2392064 │ 4677632 │ 41377792 │ 116 MB        │ 111 MB           │ 5222 kB │
└──────────┴────────┴─────────┴─────────┴──────────┴───────────────┴──────────────────┴─────────┘
(1 row)

Example on a smaller DB:

┌─────────┬────────┬─────────┬────────┬─────────┬───────────────┬──────────────────┬─────────┐
│  main   │   vm   │   fsm   │ toast  │ indexes │ sum_of_values │ pg_database_size │  diff   │
├─────────┼────────┼─────────┼────────┼─────────┼───────────────┼──────────────────┼─────────┤
│ 2809856 │ 385024 │ 1351680 │ 557056 │ 2924544 │ 7840 kB       │ 6642 kB          │ 1198 kB │
└─────────┴────────┴─────────┴────────┴─────────┴───────────────┴──────────────────┴─────────┘
(1 row)

What am I missing?

Maybe related, maybe not: I'm shocked to see the index size. They are HUGE. Is something in my query wrong?


Here is a script I used to inspect the different values:

SELECT
    SUM(pg_relation_size(oid, 'main')) AS main,
    SUM(pg_relation_size(oid, 'vm')) AS vm,
    SUM(pg_relation_size(oid, 'fsm')) AS fsm,
    SUM(
        CASE reltoastrelid
        WHEN 0 THEN 0
        ELSE pg_total_relation_size(reltoastrelid)
        END
    ) AS toast,
    SUM(pg_indexes_size(oid)) AS indexes,
    pg_size_pretty(
        SUM(pg_relation_size(oid, 'main'))::bigint +
        SUM(pg_relation_size(oid, 'vm'))::bigint +
        SUM(pg_relation_size(oid, 'fsm'))::bigint +
        SUM(pg_indexes_size(oid))::bigint +
        SUM(
            CASE reltoastrelid
            WHEN 0 THEN 0
            ELSE pg_total_relation_size(reltoastrelid)
            END
        )::bigint
    ) AS sum_of_values,
    pg_size_pretty(pg_database_size(current_database())) AS pg_database_size,

    pg_size_pretty(
        SUM(pg_relation_size(oid, 'main'))::bigint +
        SUM(pg_relation_size(oid, 'vm'))::bigint +
        SUM(pg_relation_size(oid, 'fsm'))::bigint +
        SUM(pg_indexes_size(oid))::bigint +
        SUM(
            CASE reltoastrelid
            WHEN 0 THEN 0
            ELSE pg_total_relation_size(reltoastrelid)
            END
        )::bigint - pg_database_size(current_database())::bigint
    ) AS diff

FROM pg_class
WHERE reltype != 0;

Best Answer

pg_database_size() essentially queries the filesystem for the size of $PGDATA/base/oid-of-database (the per-database data directory), plus the size of every such directory in non-default tablespaces added with CREATE TABLESPACE.

If there are files in these directories that don't pertain to any relation , they will be counted too. Looking at a random database, I can see pg_internal.init, PG_VERSION, pg_filenode.map that are in that case. They add up to the directory size and will not be counted in any control query.

Stray data files may also exist. See How to reclaim space taken by an index that partially built and was terminated by a power outage for a concrete example. They also can't be found by querying against pg_class, by definition, and they're counted in the db size on disk.

Concerning the query, just for the purpose of getting the sum of objects sizes, I'd suggest a simpler version:

select sum(pg_total_relation_size(oid)) 
from pg_class
where relkind in ('r','m','S')
  and not relisshared

Note that shared relations are removed from this query (not relisshared). Shared objects are visible from all databases and don't belong to any database's directory, they're in $PGDATA/global. One of the shared tables that is expected to be significant in size is pg_shdepend. Overlooking this in your original query makes it overestimate (in addition to @jjanes comment about toast tables).