Postgresql – Increased size of database dump even though LESS consumed storage

postgresql

For various reasons I've migrated my database schema from 4 tables with some additional indexes to keep integrity and stuff to 1 table only. That made some of the former used indexes obsolete and resulted in overall less consumed storage:

The old schema consumed ~42 GiB, while the new is ~16 GiB without the formerly available indexes and ~25 GiB with the same logical indexes. Though, a created dump of the new schema has increased from ~5,52 GiB
to 6,38 GiB. Of course I'm using the same settings to create both dumps:

pg_dump.exe "–username=%USERNAME%" "–encoding=UTF-8" "–compress=9" "–format=c" "–dbname=%DB_NAME%" > "%DMP_PATH%"

My expectation was that the dump would be smaller as well, because the data itself is the same, while lots of duplicate IDs, obsolete indexes etc. in not available tables anymore have been removed.

Any explanation for the increased dump size? Thanks!

The tables themself were changed like the following:

CREATE TABLE clt_rec
(
  id          bigserial                 NOT NULL,
  oms_rec     bigint                    NOT NULL,
  captured_at timestamp with time zone  NOT NULL,
  rssi        smallint                  NOT NULL
);

CREATE TABLE oms_rec
(
  id        bigserial NOT NULL,
  meter     integer   NOT NULL,
  encrypted bytea,
  decrypted bytea
);

CREATE TABLE clt_rec_src
(
  id          bigserial NOT NULL,
  real_estate integer   NOT NULL,
  collector   integer   NOT NULL,
  clt_rec     bigserial
);

CREATE TABLE meter_status_history
(
  id      serial  NOT NULL,
  oms_rec bigint  NOT NULL,
  status  smallint
);

vs.

CREATE TABLE datagram
(
  id            bigserial                 NOT NULL,
  src_re        integer                   NOT NULL,
  src_clt       integer                   NOT NULL,
  src_meter     integer                   NOT NULL,
  captured_at   timestamp with time zone  NOT NULL,
  captured_rssi smallint                  NOT NULL,
  oms_status    smallint                  NOT NULL,
  oms_enc       bytea,
  oms_dec       bytea
);

Best Answer

While I don't have a detailed explanation myself, there's the following hint in the docs:

For the custom archive format, this specifies compression of individual table-data segments[...]

I've changed the layout of my tables entirely and most likely have changed the "individual table-data segments" as well. So I did some further testing and got the following numbers. It's always OLD vs. NEW schema for the schema documented earlier:

--compress=0: 25  vs. 20  GiB
--compress=?: 5,6 vs. 6,5 GiB
--compress=9: 5,5 vs. 6,4 GiB

? means default settings, meaning --compress has been REMOVED from the shell command.

As can be seen, pretty much as soon as compression comes into play, the new table layout is less efficient, even though in OLD and NEW schema exactly the same bytea were used, have been placed at the end of their individual tables and alike. But things become more interesting with another version of my schema, which as well indicates a strong relationship of dump size, compression and table layout/size.

I've changed the new table datagram to be a partitioned one with partitions containing rows per year and per half-year. Each partition contains far less rows than before this way and while I only tested with --compress=9 this time, the numbers are quite interesting:

unpartitioned:     6,4 GiB
half-yearly parts: 4,8 GiB
yearly parts:      4,8 GiB

The interesting part this time is that the table layout for all partitions is the same like before, only the number of rows per table is different. Though, the number of rows overall is the same like before, the same data, IDs etc. Though, this time the dump really is smaller than with the OLD schema containing far more data because of duplicate IDs and stuff.

I wouldn't have expected table layout to be that important.

Best Answer

Related Solutions

How to Determine if a Column is Defined as Serial Data Type in PostgreSQL

Postgresql – Improve performance on concurrent UPDATEs for a timestamp column in Postgres

Related Question