Postgresql – Optimizing column types for reduced overall database size

postgresqlpostgresql-9.4schemastorage

Given the following table:

CREATE TABLE avl_historico (
    fecha timestamp with time zone NOT NULL,
    latitud double precision DEFAULT 0 NOT NULL,
    longitud double precision DEFAULT 0 NOT NULL,
    altitud double precision DEFAULT 0 NOT NULL,
    velocidad double precision DEFAULT 0 NOT NULL,
    cog double precision DEFAULT 0 NOT NULL,
    nsat integer DEFAULT 0 NOT NULL,
    tipo character(1),
    utc_hora time without time zone,
    fix_fecha date,
    imei bigint NOT NULL,
    registro timestamp with time zone,
    input1 integer DEFAULT 0,
    input2 integer DEFAULT 0,
    input3 integer DEFAULT 0,
    input4 integer DEFAULT 0,
    hdop double precision,
    adc double precision DEFAULT (-99),
    ignicion integer DEFAULT 1,
    adc2 double precision,
    power integer,
    driverid integer,
    ibutton2 integer,
    ibutton3 integer,
    ibutton4 integer,
    trailerid integer,
    adc3 double precision,
    adc4 double precision,
    horometro bigint,
    odometro bigint,
    panico integer DEFAULT 0,
    bateria double precision,
    bateriaint double precision
);

Ref: 240 bytes / row using pg_column_size()

With about 15K tables (Over 1TB total used space) that inherit from this table using "partitioning" based on the imei column as in following example:

CREATE TABLE avl_historico_201424 (
    CONSTRAINT reports_avl_historico_only_201424 CHECK ((imei = 201424))
)
INHERITS (avl_historico);

And, considering that several column types are "bigger" that what they are actually used for. For example: column tipo could be changed into boolean and, several integer columns into smallint.

Would altering some column types actually reduce overall database size?

Specs:

  • PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
  • Debian GNU/Linux 8.6 (jessie)
  • /dbdata (ext4): 2.1TB used out of 2.9TB
  • Sector size (logical/physical): 512 bytes / 512 bytes

Should I provide any additional info for better understanding of the underlying structure and/or data?

Reference for numeric types: https://www.postgresql.org/docs/9.4/static/datatype-numeric.html

Best Answer

In addition to the links a_horse_with_no_name provided on padding a few points here,

This is tremendously wasteful.

latitud double precision DEFAULT 0 NOT NULL,
longitud double precision DEFAULT 0 NOT NULL,
altitud double precision DEFAULT 0 NOT NULL,
velocidad double precision DEFAULT 0 NOT NULL,

You're storing each 0 as a bigint rather than dealing with null which would be (almost) free one bit in the null bitmap.

USE NULL, stop using 0 to represent defaults. Your average row size will drastically shrink.

Also,

    odometro bigint,

Are you measuring kilometers in micrometers? That's a gigantic number for any odometer representation.