I am trying to figure out the best way to store user statistics data in a table. For each user property, there can be anywhere from one to a couple thousand integer values (realistically, these would would average to a number in the hundreds). I know that PostgreSQL allows columns to contain integer[]
values, but is this really the best solution when I could potentially be storing a a couple hundred or a thousand integers in one cell?
Here is example table with a few of the user properties I might store in it:
| UserStatistics |
|-----------------------------------------------------------------------------------------------------|
| user_id | user_song_likes | user_album_likes | user_artist_likes | user_followers | user_following |
|-----------------------------------------------------------------------------------------------------|
| 1 | [12,56,34,11,7] | [1,154,32,33,84] | [5,14,54,43,10,2] | [55,63,23,2,4] | [43,78,2,4,22] |
|-----------------------------------------------------------------------------------------------------|
| 2 | [1,2,3,4,87,89] | [62,71,4,6,12,2] | [13,17,35,76,1,2] | [91,17,7,87,9] | [21,17,79,98,1] |
|-----------------------------------------------------------------------------------------------------|
| 3 | [45,53,42,19,2] | [7,14,144,67,93] | [13,14,2,23,3,73] | [11,56,5,12,1] | [34,91,23,10,5] |
|-----------------------------------------------------------------------------------------------------|
In this example, the user_song_likes
array will realistically hold hundreds of values (I scaled down the array for the sake of this example). So will the user_album_likes
array. user_artist_likes
will most likely come close to a hundred values, but it will definitely surpass that for some users. user_followers
will average a between 20 and 80 values, but there are going to be a few select users with tens of thousands of followers. user_following
is going to average between 100 and 2,000 values for almost every user.
All this considered, I think it would be very inefficient to store user statistics with integer[]
columns. What are my alternative options? What would be a reasonably efficient way to store this data?
Best Answer
As others have mentioned, I'd look to normalize the data in your
UserStatistics
table. You could do something stupid simple as a genericLikes
table and aFollows
table (don't take my poor names, I'm sure you can come up with better ones :).The
Likes
table would have its own primary key, and a foreign key field referencinguser_id
, and it could have a column calledlike_id
and a column calledlike_type
. Thelike_id
would store one row per ID from theuser_song_likes
,user_album_likes
, anduser_artist_likes
columns of your previous schema, and thelike_type
would represent if it's of typesong
,album
, orartist
.You could then implement the same idea for your
Follows
table (e.g. store one row per ID of eachfollower
andfollowing
in a genericfollow_id
column with afollow_type
column to determine which is which). Using a more normalized schema (even though this one is generic to keep things simple) is a more relational form of storing your data and will be more efficient for querying.