How Does PostgreSQL Store Many Varchars on Disk?

postgresql

I have a table that is +80 gigs with 200 million rows. I'm trying to speed it up and I noticed that it has a large number of varchar columns. In the schema, their lengths range from 15 chars to 250 chars with most set at 50. The total length of all the varchars is 850 bytes. In actual usage, a lot of the fields are null or the strings are really short.

I know that Postgres uses an 8k page size. Now if I have to do a full table scan and assuming worst case scenario, 8k / 850 = 9.6 records per page. Going through my full table is going to (and does) take a long time. In actuality however, since most of those fields are empty, how much space will be allocated on disk for those varchars? Will there be more records per page or does Postgres put in a little padding just to make things easy for updates later?

The reason I'm asking is because I'm exploring the idea of improving performance by kicking as many of the infrequently accessed varchars columns as possible out of this table and into another one that we would access via a join.

Any confirmation or denial of my logic is appreciated.

mj

Best Answer

Your considerations are sound, but all of this is already taken care of automatically by PostgreSQL:

  • A NULL value takes up no space at all in a PostgreSQL table row.

  • A varchar will only occupy as much space as the value actually has bytes. The length limit (type modifier) does not waste any space.

  • For rows that are really wide (2000 bytes and above), the varchars are first compressed and then, if the result is still too large, stored externally in a TOAST table. If you SELECT from such a row, the toasted values are not read unless you specify the column values are requested.

    But if the size limit for a row is 850 bytes, that is too small to get in that range.

You can check the estimated average row size in bytes by looking at the width column in the output of EXPLAIN SELECT * FROM tablename.

PostgreSQL v12 introduced the toast_tuple_target storage parameter which would enable you to reduce the limit for TOASTing data. If you tune that, and after rewriting the table with VACUUM (FULL), PostgreSQL can store the data in the way you want to. Then you can see if that actually gets you better performance.