PostgreSQL – JSONB vs BRIN Index Comparison

postgresqlpostgresql-10postgresql-11

The requirements go like this:

  • Each user has a wallet that can contain tens to a hundred or so different
    currencies, and for each currency an amount

  • Number of currency can grow > 1600 (make me anxious about column limit)

  • User base in the millions

  • A lot of reads for the whole wallet, a bit of write for a currency's amount

  • Nightly sum of total amount from all users for each currency

So far I am thinking of 2 options:

  • Each user has a {"currency":amount} JSONB field, or

  • A 500M+ rows table with user_id, currency_id, amount, clustered and use BRIN index on user_id

Which way should I go? Thank you for your advice.

Best Answer

Assuming a mostly immutable set ~ 100 currencies overall (you haven't been clear on that), and your given requirements, consider the simple approach: 1 table with 1 row per user and 1 column per currency. Like:

CREATE TABLE wallet (
   user_id   integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY 
 , currency1 integer -- or numeric, depends on missing info
 , currency2 integer
 , ...
 , currency100 integer
);

This has a massively smaller disk footprint than either of your two options so far.

4 bytes per currency in use (with integer), plus 16 bytes for the NULL bitmap. NULL storage is very cheap. See:

Data type integer or numeric?

Your option 1 (jsonb) at least doubles the size per currency in use by storing a key name for every amount. Wins with only very few currencies per user, storage-wise. Sums, calculations, indexing are slower and more complicated. Data integrity is hard to enforce.

Your option 2 occupies ~ 44 bytes per currency (separate row). Very clean data model, flexible for adding / removing currencies on the fly, but wastes a lot of space, which makes everything slow.


  • A lot of reads for the whole wallet are as simple as:

    SELECT * FROM wallet WHERE user_id = 123;
    

    You only need an index on user_id, which is provided by the PK.

  • Getting the nightly sum of total amount from all users for each currency is as simple and as fast as can be:

    SELECT sum(currency1), sum(currency2), ... FROM wallet;
    

    No index for that.


If you have a couple of dozen currencies covering the lion's share of all entries, you could try a combined strategy: fixed columns for the regulars and a jsonb column for the rest. This combines minimum storage size with absolute flexibility - at the cost of more complicated queries and computations, as you have to combine both now. And much weaker means to enforce integrity.

CREATE TABLE wallet (
   user_id   integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY 
 , currency1 integer -- or numeric, depends on missing info
 , currency2 integer
 , ...
 , currency70 integer
 , chickenfeed jsonb
);

I chose 70 currency columns to stay below the local optimum of 72 columns, before another 8 bytes are allocated for the NULL bitmap. A minor consideration. Chose a number that fits your data distribution.

Maintain a table of all allowed currencies - you do not want to search millions of rows to get the complete list. And use minimum-length key names in the jsonb column, like '{"A1":123}' (2 bytes for the key) so not to waste GB of storage to repeating lengthy names over and over.