Postgresql – Optimizing PostgreSQL hstore for lots of small updates

hstorepostgresql

I have a situation where I will have one web session per row, but this session will generate dozens of updates to an hstore field in that row, one k/v at a time. At the end, I will have a completed structure. The keys will be more or less the same per session, but will evolve over time. The values will be mixed, with some being unique per session (e.g. email) and some having lots of repetition (e.g. male/female).

I am running under the assumption that lock contention will only be present at the row level, which is fine. What other gotchas should I look out for? I have zero experience with hstore, so I really want to make sure I understand what I am in for. Any feedback is appreciated.

Best Answer

With lots of UPDATEs, each one, no matter how small, will cause the entire row to be rewritten into a new version, as a consequence of the MVCC mechanism.
Then the old version of the row will be picked up at some point by autovacuum when it's certain that no transaction may need it, and its space will be flagged as reusable .
The constant turnover of disk space for old and new rows leads to fragmentation, especially if the rows are large in size.
Additionally, the whole set of operations are logged into the WAL files (unless the table is unlogged).

For these reasons, high frequency UPDATEs of large columns is a worst case scenario for postgres.

So, if that session data doesn't really require durable storage in the first place, a mixed disk-memory specialized key/store engine like Redis is likely to perform way better.

Otherwise, this other question: What fillfactor for caching table? has good information and advice on how to mitigate the difficulties.

Related Solutions

Postgres HStore – HStore vs. Multiple Tables with Inheritance: Do I Have a Good Use Case for HStore?

That would certainly eliminate the need to have JOINs but I feel like it would be logical to have individual experiments be separate in some way. Also, it will be very common that one experiment is requested in its entirety

Both of those are good arguments for using table partitioning. PostgreSQL implements this with inheritance and constraint exclusion.

I’ve thought about having additional tables for storing meta-data/annotations relating to individual data points in individual experiments. Some of this would fit very nicely into the HStore experiments table

Nothing stops you having an hstore field for annotations, or a separate join table of hstore fields for them.

Different users will be inputting experimental data, I would use schema for this

Why? Surely the same argument applies, that you'll want to do cross-user aggregation too?

I'd just partition by experiment and have the user ID as part of the table's key.

There will be different experiment types – I think the best way to handle this would be storing these in entirely different databases

Why?

Do you ever think you might need to query across them or aggregate them? If so, do not store them in different DBs.

If they store similar kinds of data, just have them in the main table, and have some optional columns.

If they store mostly different kinds of data (mostly or entirely different columns) then use different tables in the same DB.

PostgreSQL HSTORE – How to Remove a Key

You're trying to use the expression item.details::hstore - 'vin'::TEXT as a PL/PgSQL statement.

Where's the result supposed to go?

At a guess, I think what you intended to write is:

item.details := item.details - 'vin'::TEXT;

i.e. "set item.details to the old value of item.details with the key vin removed".

It's kind of hard to be sure without more context.

Best Answer

Related Solutions

Postgres HStore – HStore vs. Multiple Tables with Inheritance: Do I Have a Good Use Case for HStore?

PostgreSQL HSTORE – How to Remove a Key

Related Question