PostgreSQL Performance – JSONB Column vs Standard Normalized Table

database-designperformancepostgresql

I'm implementing an app with a payment system, and I need to record transactions that are made through the app. Also I need to use some of the information about the transaction for rendering some KPIs. I have an implementation already in Postgres in which my table has two columns, id and transaction (jsonb).
Inside my transaction column I have an object that looks like this:

2018: {
   November: {
      list_of_transactions: [],
      totalAmountEarned: 0,
      numberOfTransactions: 0,
      avarageSpending: 0,
      numberOfCoins: 0,
      numberOfUsers: 0
   }  
}

Now, whenever I make a transaction, I check that the year and the month that comes with the request exists, else add them to the object, and push the transaction inside list_of_transactions, and update all the other keys accordingly.

I was wondering whether this is a good way of tackling the problem, or is in fact a really bad way. Is creating different tables and normalizing them in a "SQL fashion" a better solution? Have you got any suggestion?

Additional considerations

A side question: since there's going to be many many transactions is it a good idea to create a new table for each year?

The structure of all the data involved will be exactly the same, so I should create multiple databases and do joins. Since there's going to be many, many transactions: is it a good idea to create a new table for each year?

Best Answer

The rule of thumb in every relational database is normalizing (usually up to 3NF or even 4NF). With the advent of JSON into the relational world, people are often tempted to solve everything using JSON, as it probably means a very simple way of channeling some data moving in the application into the database. However, while it has its own place, I see it more often to be the case than not, that the data could be mostly or in its entirety normalized.

Basically whenever you see that your JSONs share the same keys (having values of the same data type), you should go and extract those keys into proper columns. If the whole structure is always the same, extract everything, create tables as needed (if you have arrays in the JSON, for example, it can be a sign you need a table for that key), and - as basically always in the relational world - do joins between them.

Besides all the problems normalization solves, this has performance benefits, too. First, you don't store the same key again and again with every transaction, sparing probably a lot of storage space. Second, in the typical case you can have more efficient indexes over your columns (or groups of them). Third, you can use primary and foreign keys, and constraints to enforce data integrity (well, this is partly covered in the normalization part...). Finally, RDBMSes are very good at joining tables and coming up with efficient query plans to retrieve the data you need.

As for keeping a separate table for each year: this can be achieved by paritioning (with recent PostgreSQL versions having important improvements with it). If you need to do that or not, is hard to tell - it depends on the data volume in the first place, but other factors, too. You can possibly create your transaction table as partitioned, but only with a single default partition, at least if you opt to use version 11 or later. In this case, if necessary, you can decide to add yearly partitions later.