Postgresql – table organization with data that will never change

join;normalizationpostgresql

for reference, almost all my DB experience is with mongodb and this is my first postgres project.

I'm storing data updates from decentralized finance crypto pools.

The data structure I'm exploring is very simple.
There are tokens:

type Token =
    {
        Address:    string     // primary key
        Decimals:   int
        Symbol:     string
    }

This data will NEVER change but is referenced everywhere; roughly 2k entries

type LiquidityToken =
    {
        Address:        string    // primary key
        Symbol:         string
        Token0Address:  string    // matches the address from a token above
        Token1Address:  string    // matches the address from a token above
        Timestamp:      DateTime  // changes at every update
        Token0Amount:   decimal   // changes at every update
        Token1Amount:   decimal   // changes at every update
    }

this is updated every 5 mins, however ONLY Token0Amount and Token1Amount change and I want to keep a history.

I've roughly 500 of these objects with a new update every 5 minutes, so 140k new entries per day.

so the first thing I wanted to do is make a join between the Token0/1Address fields and the Token record, but since the Token just adds an int and a short (up to 32chars, but usually < 8 chars) string, I was wondering is this is doing much saving (I don't know how much space the join is taking).

then I was wondering if it is worth it to split the object into two pieces:

type LiquidityToken =
    {
        Address:        string    // primary key
        Symbol:         string
        Token0Address:  string    // matches the address from a token above
        Token1Address:  string    // matches the address from a token above
    }

and

type LiquidityTokenUpdate =
    {
        Address:        string    // matches the address of the liquidity token 
        Timestamp:      DateTime  // changes at every update
        Token0Amount:   decimal   // changes at every update
        Token1Amount:   decimal   // changes at every update
    }

and just replicate this object. But then now a query may pull the update, which pulls the liquidity token which has to pull 2 token objects.

Since this is for a public GraphQL interface, there can be quite a few usage patterns that will pop up. I can't predict a specific type of queries because this is something I want to make public for the community once it's ready.

I understand that caching is my friend here and since there static data is small it all fits in RAM, but overall would it make sense to try to save space (each record is not very large) and add the cost of the joins, or lookup in the local cache, or is it better to just save everything at each update since disk is cheap after all?

Please, put in context that this is not something I have experience with; I've read about joins, normalizing vs denormalizing tables, etc but I don't have practical experience with that. I dealt with very large data sets but in a non SQL context and this is my first time here, so I hope the question provides the info needed 😀

Best Answer

I'm not the one who downvoted your question, so I can't speak on behalf of that person, but my guess is because your question is a little abstract and wordy, and this site is usually for more direct and concretely worded questions, that was the downvoter's reasoning, because it's a little unclear exactly what your goals are / what you're asking. That being said, I agree it is annoying to be downvoted without explanation, so conversely you have my upvote, because your question is fair enough. ?

Anyways, what I can say is it sounds like the size of your data is of no concern here for PostgreSQL. A quick calculation shows that at the rate of new data you mentioned, you'd still be under half a billion records in a 10 year timespan. I wouldn't be even concerned about caching so much because your object is rather small (ergo a single row in the table isn't very wide and will be small).

If your Token object and TokenLiquidity object have a one-to-one relationship then it doesn't realistically matter too much if you normalize them into two tables or denormalized them into one, because you wouldn't have data redundancy, and the row / table wouldn't grow much in total size by combining them into one table, so there's no concern there either.

If the relationship from Token is one to many with TokenLiquidity then that's a different story because you'll end up repeating properties of the Token across multiple rows of the TokenLiquidity table, which can make management more difficult should one of those properties' values change.

The only thing I'd personal recommend, because it just makes sense structurally, is having a third separate table dedicated to storing the history of the LiquidityToken table and furthermore would specifically just store the Token0Amount, Token1Amount and Address (and maybe a DateTime based field to log the when) fields to minimize data redundancy / maximize normalization. This will ensure your LiquidityToken table is as lightweight as possible, and ergo making the joins to the Token table as efficient as possible too. The history table would only be joined to as needed, and if you had the DateTime field, for only as far back as needed.