Postgresql – pg_database_size() is small,but pg_dump is very big

postgresql

I faced a strange question.
when I use the psql command " \l+ db_name" or following sql,

cattle_dev=# \l+ cattle_dev                                        
                                                    List of databases
    Name    |   Owner    | Encoding |   Collate   |    Ctype    | Access privileges |  Size   | Tablespace | Description 
------------+------------+----------+-------------+-------------+-------------------+---------+------------+-------------
 cattle_dev | cattle_dev | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 |                   | 1738 MB | pg_default | 
(1 row)

cattle_dev=# select pg_size_pretty(pg_database_size('cattle_dev'));
 pg_size_pretty 
----------------
 1738 MB
(1 row)

cattle_dev=# 

it tells that the database is small.
But,if i use the pg_dump to backup this database,the result is too large,it's almost 18GB.

[enterprisedb@ppasdev 20170605012443]$ ll -h cattle_dev_20170605012443.dmp 
-rw-rw-r--. 1 enterprisedb enterprisedb 18G Jun  5 01:34 cattle_dev_20170605012443.dmp
[enterprisedb@ppasdev 20170605012443]$ 

The problem is why the pg_dump result is too large,but the pg_database_size is small?

any help would be appreciated.thanks.

Best Answer

One of them is the binary size on disk. PostgreSQL compresses on disk all TOASTable fields over 2k (like text). Think about a binary value: they usually represent less space on disk than the same value CASTed to text anyway.

SELECT b::text, pg_column_size(b) AS on_disk, length(b::text) AS text
FROM ( VALUES (now()) ) AS t(b)
UNION ALL 
  SELECT b::text, pg_column_size(b), length(b::text)
  FROM ( VALUES (49839489::int) ) AS t(b)
UNION ALL
  SELECT b::text, pg_column_size(b), length(b::text)
  FROM ( VALUES ('192.168.43.58'::inet) ) AS t(b);
               b               | on_disk | text 
-------------------------------+---------+------
 2017-06-05 20:56:45.978472-05 |       8 |   29
 49839489                      |       4 |    8
 192.168.43.58/32              |      10 |   16
(3 rows)

In addition, you're going to have overhead merely because the SQL, schema, and input format (being CSV or whatever).

You may want to look into the -z option. (Or, xy if you need the best compression).

-Z 0..9
--compress=0..9
  Specify the compression level to use. Zero means no compression. For the custom archive format, this specifies compression of individual table-data
  segments, and the default is to compress at a moderate level. For plain text output, setting a nonzero compression level causes the entire output file
  to be compressed, as though it had been fed through gzip; but the default is not to compress. The tar archive format currently does not support
  compression at all.