Postgresql – How do completely empty columns in a large table affect performance

database-designdisk-spaceperformancepostgresqlpostgresql-performancestorage

I have 400 million rows in a Postgres db, and the table has 18 columns:

id serial NOT NULL,
a integer,
b integer,
c integer,
d smallint,
e timestamp without time zone,
f smallint,
g timestamp without time zone,
h integer,
i timestamp without time zone,
j integer,
k character varying(32),
l integer,
m smallint,
n smallint,
o character varying(36),
p character varying(100),
q character varying(100)

Columns e, k, and n are all NULL, they do not store any values at all and are completely useless at this point. They were part of the original design, but never removed.

Edit – most of the other columns are non-NULL.

Questions:

  1. How can I calculate the impact this has on storage?
    Is it equal to size of column * # of rows ?

  2. Will dropping these empty columns noticeably improve performance for this table?
    Would the page cache be able to fit more rows?

Best Answer

Since the columns e, k, and n can be NULL, I assume "100% empty" means NULL.

NULL storage is cheap. Each NULL "costs" one bit in the null bitmap for storage and otherwise hardly affects performance. Effective storage requirements depend on whether the null bitmap for each row already exists and has room for 3 more bits.

Tables with up to 8 user columns utilize a spare byte after the tuple header for the null bitmap in the current implementation. For any more, another MAXALIGN (typically 8) bytes are allocated providing for another 64 columns (for a total 72 now). Etc. So the null bitmap effectively costs 8 bytes per row for most tables, including yours which has 18 columns.

The null bitmap is allocated completely or not at all per row and only if there is at least one actual NULL value. If all columns are defined NOT NULL, there can never be one.

In the absence of other, more important considerations, move columns that are always (or mostly) NULL to the end of the row. Helps performance a tiny bit.

Answer to question 1.

The effective storage requirement is:

  • 0 bytes per row where other columns are also NULL.
  • 8 bytes (typically) per row where no other column is NULL, for allocating the null bitmap

Answer to question 2.

Dropping would not buy you much. Page cache would be the same or slightly less depending on 1. Handling queries, backups, source code and such would be simplified.

Further reading:

What you did not ask for

After a round of "column tetris" I suggest this table layout - again barring other, more important considerations:

  id serial NOT NULL
, a  integer
, g  timestamp
, i  timestamp
, b  integer
, c  integer
, h  integer
, j  integer
, l  integer
, d  smallint
, f  smallint
, m  smallint
, o  varchar(36)
, p  varchar(100)
, q  varchar(100)
, k  varchar(32)      --  always NULL
, n  smallint         --  always NULL
, e  timestamp        --  always NULL

Saves at least 16 bytes per row compared to your original layout, probably around 20+, depending on your varchar columns. Instructions for "column tetris" here:

How to reorder columns

The simplest method is to create a new table and copy data into it. Then you get a pristine new table without dead columns (and dead rows). You could sort (cluster) the physical order of rows in the process, too.

Changes in place are possible too, but there is no convenient tool (I would know of) to generally reorder the sequence of columns. Depending objects like views, functions (depends), FK constraints, indexes etc. are limiting factor here. Those can block you from deleting a column, you have to drop and recreate depending objects.

And of course, concurrent access to the table conflicts with DDL commands changing it. You need exclusive table locks in this case.

It's simple and cheap to delete completely empty columns and add them at the end of the row. Switching the sequence of populated columns in the middle of the table is not as simple or cheap. I would create a new table for that.

This related answer has recipes and links to more: