The key words here are:
- "heavily updated"
- "in the table for 2-3 hours".
Point 1. is indication for a lower fill factor, while 2. is the opposite. It helps performance if multiple row versions are stored on the same data page. H.O.T. updates would achieve that. Read here or here. They need some wiggle room on the data page - like dead tuples or space reserved by a fillfactor
< 100. But they can only do their thing, if no index involves any of the updated columns, which should be true for your case.
Another important factor here would be the tuple size (in comparison to your page size (which is most commonly 8 kb). More details in this related answer:
If the tuple size is 4 kb or more, reducing the fill factor would be futile, since there can never be more than one tuple on a data page. You might as well leave it at 100
(which is the default anyway). However, some data types are "toasted" and stored out-of-line if they exceed a size limit, so tuples requiring that much in the main relation fork are rare.
Whatever you do, VACUUM
will be run often. And that's generally a good thing, I wouldn't worry about that. You create lots of dead tuples. VACUUM
identifies dead rows that are not visible to any open transaction any more. The manual:
The standard form of VACUUM
removes dead row versions in tables and
indexes and marks the space available for future reuse.
Bold emphasis mine.
You can play with per-table settings for autovacuum to trigger it less (or more) often for this table only:
The default thresholds and scale factors are taken from
postgresql.conf
, but it is possible to override them on a table-by-table basis;
Bold emphasis mine. In particular with autovacuum_vacuum_threshold
and autovacuum_vacuum_scale_factor
. Running VACUUM
a lot might actually be a good idea, instead of a very low fillfacter
. That depends on access patterns. If all tuples live, say, 3 hours and each is updated several times, I would still lower the fillfactor
to something like 50. You'll have to test and find the sweet spot.
Alternatives
All this aside, since your data seem to be volatile to begin with: use an UNLOGGED
table:
Data written to unlogged tables is not written to the write-ahead log
(see Chapter 29), which makes them considerably faster than
ordinary tables. However, they are not crash-safe: an unlogged
table is automatically truncated after a crash or unclean shutdown.
The contents of an unlogged table are also not replicated to standby servers.
Bold emphasis mine. Don't use this if your server might crash and you still need the data afterwards. But if we are talking about session data for web applications, this might be an acceptable price to pay.
Or, even more radical: Use a key-value store like Redis if you can do without the features and security provided by an RDBMS altogether.
You're addressing the performance problems that arise from random access, which as you point out is solved well enough by eliminating the cost of random access altogether with an SSD. However, postgres does offer "Heap Only Tuple (HOT) updates" (some notes here and here and here), which may benefit you from having a fillfactor less than 100% regardless of the physical medium in the case that you make updates moderately to frequently (the more frequent, the more the savings) and those updates do not affect any indexed columns.
Imagine if the fillfactor were exactly 100% and over time your table's data fills exactly one page. On the next update (assume it's of just one row), a copy of the updated row is made to the next page and the original row is left on the current page; if the transaction is committed the pre-update version of that row is marked as dead to be removed by a vacuum and the post-update version is marked as live. Now, let's say that update only affected a column "data_value" for a specific PK value, like update table_name set data_value = 1 where pk = 3
. In this case, the heap location of the data changed pages, so the index supporting the PK constraint has to be updated to reflect that change. Regardless of the physical medium, that's a potential performance hit. Moreover, that dead tuple is left lingering until vacuum reclaims the space.
With HOT updates, as long as you don't update an indexed column, the database can avoid having to update index entries by keeping the post-update tuple on the same page as the pre-update table, which can speed up your updates. On top of that, HOT updates provide a bit of automatic maintenance and keep the page freer of those dead rows, reducing the need for vacuuming and auto-vacuuming.
Also, even in the case of data access, you might still benefit from having your data localized. Even if a random page read from an SSD takes a negligible amount of time, having to perform many reads is still (many * negligible time) more expensive than just reading the minimum amount of pages required to hold the data.
Best Answer
No generalised quantitative method exists to be followed.
One would'e come up by now (1+ years since question), nor have i come across one on the internet searches.