Postgresql – table organization with data that will never change

join;normalizationpostgresql

for reference, almost all my DB experience is with mongodb and this is my first postgres project.

I'm storing data updates from decentralized finance crypto pools.

The data structure I'm exploring is very simple.
There are tokens:

type Token =
    {
        Address:    string     // primary key
        Decimals:   int
        Symbol:     string
    }

This data will NEVER change but is referenced everywhere; roughly 2k entries

type LiquidityToken =
    {
        Address:        string    // primary key
        Symbol:         string
        Token0Address:  string    // matches the address from a token above
        Token1Address:  string    // matches the address from a token above
        Timestamp:      DateTime  // changes at every update
        Token0Amount:   decimal   // changes at every update
        Token1Amount:   decimal   // changes at every update
    }

this is updated every 5 mins, however ONLY Token0Amount and Token1Amount change and I want to keep a history.

I've roughly 500 of these objects with a new update every 5 minutes, so 140k new entries per day.

so the first thing I wanted to do is make a join between the Token0/1Address fields and the Token record, but since the Token just adds an int and a short (up to 32chars, but usually < 8 chars) string, I was wondering is this is doing much saving (I don't know how much space the join is taking).

then I was wondering if it is worth it to split the object into two pieces:

type LiquidityToken =
    {
        Address:        string    // primary key
        Symbol:         string
        Token0Address:  string    // matches the address from a token above
        Token1Address:  string    // matches the address from a token above
    }

and

type LiquidityTokenUpdate =
    {
        Address:        string    // matches the address of the liquidity token 
        Timestamp:      DateTime  // changes at every update
        Token0Amount:   decimal   // changes at every update
        Token1Amount:   decimal   // changes at every update
    }

and just replicate this object. But then now a query may pull the update, which pulls the liquidity token which has to pull 2 token objects.

Since this is for a public GraphQL interface, there can be quite a few usage patterns that will pop up. I can't predict a specific type of queries because this is something I want to make public for the community once it's ready.

I understand that caching is my friend here and since there static data is small it all fits in RAM, but overall would it make sense to try to save space (each record is not very large) and add the cost of the joins, or lookup in the local cache, or is it better to just save everything at each update since disk is cheap after all?

Please, put in context that this is not something I have experience with; I've read about joins, normalizing vs denormalizing tables, etc but I don't have practical experience with that. I dealt with very large data sets but in a non SQL context and this is my first time here, so I hope the question provides the info needed 😀

Best Answer

I'm not the one who downvoted your question, so I can't speak on behalf of that person, but my guess is because your question is a little abstract and wordy, and this site is usually for more direct and concretely worded questions, that was the downvoter's reasoning, because it's a little unclear exactly what your goals are / what you're asking. That being said, I agree it is annoying to be downvoted without explanation, so conversely you have my upvote, because your question is fair enough. ?

Anyways, what I can say is it sounds like the size of your data is of no concern here for PostgreSQL. A quick calculation shows that at the rate of new data you mentioned, you'd still be under half a billion records in a 10 year timespan. I wouldn't be even concerned about caching so much because your object is rather small (ergo a single row in the table isn't very wide and will be small).

If your Token object and TokenLiquidity object have a one-to-one relationship then it doesn't realistically matter too much if you normalize them into two tables or denormalized them into one, because you wouldn't have data redundancy, and the row / table wouldn't grow much in total size by combining them into one table, so there's no concern there either.

If the relationship from Token is one to many with TokenLiquidity then that's a different story because you'll end up repeating properties of the Token across multiple rows of the TokenLiquidity table, which can make management more difficult should one of those properties' values change.

The only thing I'd personal recommend, because it just makes sense structurally, is having a third separate table dedicated to storing the history of the LiquidityToken table and furthermore would specifically just store the Token0Amount, Token1Amount and Address (and maybe a DateTime based field to log the when) fields to minimize data redundancy / maximize normalization. This will ensure your LiquidityToken table is as lightweight as possible, and ergo making the joins to the Token table as efficient as possible too. The history table would only be joined to as needed, and if you had the DateTime field, for only as far back as needed.

Related Solutions

Sql-server – Query to normalize table/combine row text

This should work, I will clean it up later so its more efficient.

DECLARE @Old TABLE ( 
  id         INT, 
  rank       INT, 
  linenumber INT, 
  sometext   VARCHAR(1000)) 
DECLARE @New TABLE ( 
  id           INT, 
  rank         INT, 
  combinedtext VARCHAR(1000)) 


;WITH combinedresults(ctid, id, rank, linenumber, combinedtext) 
     AS (SELECT 0, 
                id, 
                rank, 
                linenumber, 
                CAST (sometext AS VARCHAR(8000)) 
         FROM   @Old o 
         WHERE  NOT EXISTS (SELECT TOP 1 1 
                            FROM   @Old 
                            WHERE  id = o.id 
                                   AND rank = o.rank 
                                   AND linenumber < o.linenumber) 
         UNION ALL 
         SELECT ctid + 1, 
                o.id, 
                o.rank, 
                o.linenumber, 
                ct.combinedtext + o.sometext 
         FROM   @Old o 
                INNER JOIN combinedresults ct 
                  ON ct.id = o.id 
                     AND ct.rank = o.rank 
         WHERE  o.linenumber > ct.linenumber) 

UPDATE n 
SET    combinedtext = ct.combinedtext 
FROM   @New n 
       INNER JOIN (SELECT n.id, 
                          n.rank, 
                          MAX(o.rank) orank 
                   FROM   @new n 
                          INNER JOIN @Old o 
                            ON n.id = o.id 
                               AND o.rank <= n.rank 
                   GROUP  BY n.id, 
                             n.rank) r 
         ON n.id = r.id 
            AND n.rank = r.rank 
       INNER JOIN (SELECT id, 
                          ct.rank, 
                          MAX(ctid) ctid 
                   FROM   combinedresults ct 
                   GROUP  BY ct.id, 
                             ct.rank) r2 
         ON r2.id = r.id 
            AND r2.rank = r.orank 
       INNER JOIN combinedresults ct 
         ON r.id = ct.id 
            AND ct.rank = r.orank 
            AND ct.ctid = r2.ctid 

SELECT * 
FROM   @New

MySQL : LEFT JOIN not fetching required data

Storing a string of comma-separated values in a column is not an ideal design; ideally you would have a table of keywords and a 2-column map table with a row for every (E_HEADER.id,keyword_id).

However, given your current structure, the FIND_IN_SET() function should work, although this will not be the best possible performance, since a function in the where clause that takes a column as its argument will generally prevent the use of indexes for evaluating that condition -- but given your structure, you don't have an alternative.

FIND_IN_SET() returns the ordinal position of the first argument within the comma-separated string that is the second argument, or in this case "something true" if a match is found; no match returns 0 ("false") and if either argument is null, the function returns NULL.

Also, it's difficult to say without a better understanding of your structure exactly what you want the query to do, but the derived table seems unnecessary.

SELECT Q.question_id, Q.school_id
  FROM Q_MASTER Q
  LEFT JOIN E_HEADER X on X.school_id = Q.school_id AND X.low_q = 1 AND X.school_id = 6
 WHERE NOT FIND_IN_SET(Q.question_id, X.keyword);

This would select every question_id and school_id from Q_MASTER where there exists a row in E_HEADER with a matching school_id and low_q is equal to 1 and school_id is equal to 6 and where the value of Q.question_id is not found in the comma-separated string X.keyword.

Based on my interpretation of the original query, only Q_MASTER records with school_id = 6 will be returned in spite of the left join, because the join requires school_id to be identical, and will project a NULL for X.keyword otherwise; FIND_IN_SET() returns NULL if either argument is null, and the expression "NOT (NULL)" returns evaluates to NULL which will be considered untrue by the WHERE clause. The query will perform much better as an inner JOIN than a LEFT [outer] JOIN because fewer rows will need to be considered, but the ultimately correct approach requires a valid logical expression of what you actually need the server to do.

Best Answer

Related Solutions

Sql-server – Query to normalize table/combine row text

MySQL : LEFT JOIN not fetching required data

Related Question