Postgresql – BIG Binary column – PostgreSQL 9.2

blobmigrationpostgresql

I've got the following table:

CREATE TABLE gorfs.inode_segments
(
  st_ino "gorfs"."ino_t" NOT NULL, -- Inode number the segment belongs to. alongside segment_index, it forms the table's primary key to ensure uniqueness per relevant scope
  segment_index "gorfs"."pathname_component" NOT NULL, -- See st_no's column description for further details. The meaning of this column varies based on the host inode type:...
  st_ino_target "gorfs"."ino_t", -- Target inode number. Meaningful for directory inode segments (objects in the directory)
  full_path "gorfs"."absolute_pathname", -- Exploded absolute canonical path for quick lookups. Meaningful only for directory inode segments (objects in the directory)
  segment_data "bytea", -- Actual data segment. Meaningful only for S_IFLNK and S_IFREG....
  CONSTRAINT pk_inode_segments PRIMARY KEY ("st_ino", "segment_index"),
  CONSTRAINT fk_host_inode_must_exist FOREIGN KEY (st_ino)
      REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_target_inode_must_exist FOREIGN KEY (st_ino_target)
      REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT uc_no_duplicate_full_paths UNIQUE ("full_path"),
  CONSTRAINT cc_only_root_can_be_its_own_parent CHECK ("st_ino_target" IS NULL OR "st_ino"::bigint <> "st_ino_target"::bigint OR "st_ino"::bigint = 2)
)

The column "segment_data" has all the attachments from the customers, like photos, documents, audio, etc.

To improve the performance, we'll be migrating all the data from the segment_data column to a NFS server.

  • Full database size: 2 TB

  • Using PostgreSQL 9.2

  • Table gorfs.inode_segments size: 1698 GB (Command: SELECT
    pg_size_pretty( pg_total_relation_size('gorfs.inode_segments'));
    )

  • Basically we'll be using: UPDATE gorfs.inode_segments SET segment_data = NULL WHERE batch_number = 0

  • We won't be migrating all the data on the segment_data column by once, it will be done gradually (over few weeks).

Question:

Just by setting "NULL" into that column, will I get free space on the database? Will I have to run any command such VACUUM/VACUUM FULL/ANALYZE/PG_DUMP?

Best Answer

PostgreSQL's MVCC is used over-writing mechanism, which is roughly mentioned to this document , so all updated rows are inserted into the table. Thus, you cannot get free space. On the contrary, the table size is temporarily increased.

(Even though autovacuum feature works very efficiently,) I think you have to do VACUUM FULL in order to certainly remove dead rows(tuples) and dead pages after updating. That way (do UPDATE SET NULL and VACUUM FULL), the size of the table will be small.