There's a popular and seemingly authoritative blog post called On Rocks and Sand on how to optimize PostgreSQL tables for size to eliminate internal padding by re-ordering their column length. They explain how variable-length types incur some extra padding if they're not at the end of the table:
This means we can chain variable length columns all day long without introducing padding except at the right boundary. Consequently, we can deduce that variable length columns introduce no bloat so long as they’re at the end of a column listing.
And at the end of the post, to summarize:
Sort the columns by their type length as defined in pg_type.
There's a library that integrates with Ruby's ActiveRecord to automatically re-order columns to reduce padding called pg_column_byte_packer. You can see the README in that repo cites the above blog post and in general does the same thing that the blog post describes.
However, the pg_column_byte_packer
does not return results consistent with the blog post it cites. The blog post pulls from from PostgreSQL's internal pg_type.typelen
which puts variable-length columns always at the end via an alignment of -1. pg_column_byte_packer
gives them an alignment of 3.
pg_column_byte_packer
has an explanatory comment:
# These types generally have an alignment of 4 (as designated by pg_type
# having a typalign value of 'i', but they're special in that small values
# have an optimized storage layout. Beyond the optimized storage layout, though,
# these small values also are not required to respect the alignment the type
# would otherwise have. Specifically, values with a size of at most 127 bytes
# aren't aligned. That 127 byte cap, however, includes an overhead byte to store
# the length, and so in reality the max is 126 bytes. Interestingly TOASTable
# values are also treated that way, but we don't have a good way of knowing which
# values those will be.
#
# See: `fill_val()` in src/backend/access/common/heaptuple.c (in the conditional
# `else if (att->attlen == -1)` branch.
#
# When no limit modifier has been applied we don't have a good heuristic for
# determining which columns are likely to be long or short, so we currently
# just slot them all after the columns we believe will always be long.
The comment appears to be not wrong as text columns do have a pg_type.typalign
of 4 but they've also got a pg_type.typlen
of -1 which the blog post argues gets the most optimal packing when at the end of the table.
So in the case of a table that has an integer
column, a text
column, and a smallint
column, pg_column_byte_packer
will put the text columns right in between the two. They've even got a unit test to assert that this always happens.
My question here is: what order of columns actually packs for minimal space? The comment from pg_column_byte_packer
appears to be not wrong as text columns do have a pg_type.typalign
of 4, but they've also got a pg_type.typlen
of -1.
Best Answer
Had me puzzled, too, when I researched it a couple of years back.
typlen = -1
just indicatesvarlena
storage, which nominally hastypalign = 'i'
(integer alignment, needs to start at 4-bytes offset). But that's not the whole story. Eventually, I found the explanation in a note in the source code:So a
varlena
datum < 127 bytes (after possible compression) only adds 1 byte of overhead (signifying its length) and requires no alignment padding "on disk". (Hardly any "disks" any more these days.)See:
To answer the question in the title:
True-ish. Datums exceeding 127 bytes cannot be stored in optimized form and fall back to requiring "integer alignment". If we don't know that a varlena-type column will stay below that threshold (most of the time), we can't say for sure.
Also, there are additional considerations for optimized storage. With many columns in a row, it's cheaper to compute tuple storage offsets with
NOT NULL
, fixed-size length columns first. Placing frequently accessed columns first also yields a tiny advantage. All of this is further complicated by the TOAST mechanism and index(-only) access.But all of these effects are typically tiny. And losing 3 bytes to alignment padding pales in comparison when a column occupies 200 bytes. So mostly not worth the bother. The rule of thumb covers most of it:
Sort columns by their required alignment
typalign
:d
-->i
-->s
-->c
.But
typlen = -1
("varlena") last (typically), even though formallytypalign = 'i'
.The manual:
Your example
pg_column_byte_packer
is living up to its name.int
-->text
-->smallint
is as tight as it gets.For the typical case of short strings, the only relevant decision is to put
int
first.smallint
can at most force 1 additional byte of alignment padding at odd byte offsets. Since tuple space is always allocated in multiples of 8 bytes, this can never result in a bigger tuple.Strings exceeding the threshold of 127 bytes on disk (including 1 leading length-byte), flip to 4 leading length bytes and require the nominal
integer
alignment. That's where puttingtext
beforesmallint
can safe 8 bytes effectively. With random string length that happens in 25 % of cases, so 2 bytes on average for tuples of at least 144 bytes.There's that. But putting the
smallint
first typically has tiny advantages, and mosttext
columns stay well below the length threshold.The thing to remember is not to intersperse multiple
smallint
andtext
columns. Multiple offsets can add up in this case.