How Does PostgreSQL Store Many Varchars on Disk?

postgresql

I have a table that is +80 gigs with 200 million rows. I'm trying to speed it up and I noticed that it has a large number of varchar columns. In the schema, their lengths range from 15 chars to 250 chars with most set at 50. The total length of all the varchars is 850 bytes. In actual usage, a lot of the fields are null or the strings are really short.

I know that Postgres uses an 8k page size. Now if I have to do a full table scan and assuming worst case scenario, 8k / 850 = 9.6 records per page. Going through my full table is going to (and does) take a long time. In actuality however, since most of those fields are empty, how much space will be allocated on disk for those varchars? Will there be more records per page or does Postgres put in a little padding just to make things easy for updates later?

The reason I'm asking is because I'm exploring the idea of improving performance by kicking as many of the infrequently accessed varchars columns as possible out of this table and into another one that we would access via a join.

Any confirmation or denial of my logic is appreciated.

Best Answer

Your considerations are sound, but all of this is already taken care of automatically by PostgreSQL:

A NULL value takes up no space at all in a PostgreSQL table row.
A varchar will only occupy as much space as the value actually has bytes. The length limit (type modifier) does not waste any space.
For rows that are really wide (2000 bytes and above), the varchars are first compressed and then, if the result is still too large, stored externally in a TOAST table. If you SELECT from such a row, the toasted values are not read unless you specify the column values are requested.

But if the size limit for a row is 850 bytes, that is too small to get in that range.

You can check the estimated average row size in bytes by looking at the width column in the output of EXPLAIN SELECT * FROM tablename.

PostgreSQL v12 introduced the toast_tuple_target storage parameter which would enable you to reduce the limit for TOASTing data. If you tune that, and after rewriting the table with VACUUM (FULL), PostgreSQL can store the data in the way you want to. Then you can see if that actually gets you better performance.

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

You write:

Each customer can have multiple sites, but only one should be displayed in this list.

Yet, your query retrieves all rows. That would be a point to optimize. But you also do not define which site is to be picked.

Either way, it does not matter much here. Your EXPLAIN shows only 5026 rows for the site scan (5018 for the customer scan). So hardly any customer actually has more than one site. Did you ANALYZE your tables before running EXPLAIN?

From the numbers I see in your EXPLAIN, indexes will give you nothing for this query. Sequential table scans will be the fastest possible way. Half a second is rather slow for 5000 rows, though. Maybe your database needs some general performance tuning?

Maybe the query itself is faster, but "half a second" includes network transfer? EXPLAIN ANALYZE would tell us more.

If this query is your bottleneck, I would suggest you implement a materialized view.

After you provided more information I find that my diagnosis pretty much holds.

The query itself needs 27 ms. Not much of a problem there. "Half a second" was the kind of misunderstanding I had suspected. The slow part is the network transfer (plus ssh encoding / decoding, possibly rendering). You should only retrieve 100 rows, that would solve most of it, even if it means to execute the whole query every time.

If you go the route with a materialized view like I proposed you could add a serial number without gaps to the table plus index on it - by adding a column row_number() OVER (<your sort citeria here>) AS mv_id.

Then you can query:

SELECT *
FROM   materialized_view
WHERE  mv_id >= 2700
AND    mv_id <  2800;

This will perform very fast. LIMIT / OFFSET cannot compete, that needs to compute the whole table before it can sort and pick 100 rows.

pgAdmin timing

When you execute a query from the query tool, the message pane shows something like:

Total query runtime: 62 ms.

And the status line shows the same time. I quote pgAdmin help about that:

The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page.

If you want to see the time on the server you need to use SQL EXPLAIN ANALYZE or the built in Shift + F7keyboard shortcut or Query -> Explain analyze. Then, at the bottom of the explain output you get something like this:

Total runtime: 0.269 ms

PostgreSQL Query Performance – What is Retrieved from Disk During a Query

The physical storage for rows is described in the docs in Database Page Layout. The column contents for the same row are all stored in the same disk page, with the notable exception of TOAST'ed contents (too large to fit in a page). Contents are extracted sequentially within each row, as explained:

To read the data you need to examine each attribute in turn. First check whether the field is NULL according to the null bitmap. If it is, go to the next. Then make sure you have the right alignment. If the field is a fixed width field, then all the bytes are simply placed.

In the simplest case (no TOAST'ed columns), postgres will fetch the entire row even if few columns are needed. So in this case, the answer is yes, having more columns may have a clear adverse impact on waster buffer cache, particularly if the column contents are large while still under the TOAST threshold.

Now the TOAST case: when an individual field exceeds ~2kB , the engine stores the field contents into a separate physical table. It also comes into play when the entire row doesn't fit into a page (8kB by default): some of the fields are moved to TOAST storage. Doc says:

If it's a variable length field (attlen = -1) then it's a bit more complicated. All variable-length data types share the common header structure struct varlena, which includes the total length of the stored value and some flag bits. Depending on the flags, the data can be either inline or in a TOAST table; it might be compressed, too

TOAST'ed contents are not fetched when they're not explicitly needed, so their effect on the total number of pages to fetch is small (a few bytes per column). This explains the results in @dezso's answer.

As for writes, each row with all its columns is entirely rewritten on each UPDATE, no matter what columns are changed. So having more columns is obviously more costly for writes.

Best Answer

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

pgAdmin timing

PostgreSQL Query Performance – What is Retrieved from Disk During a Query

Related Question