MySQL database design for Twitter-like timeline

database-designMySQL

I am working on an application which requires to show different posts on user's timeline(something like Twitter).

A User's timeline consists of following:

  1. Reports which user created

  2. Reports which other user created and fall under his/her 'alert location' radius

  3. Reports posted by other users which he/she follows (Even if that report does not fall inside 'alert location' radius)
  4. Report Shared/Re-tweeted by user he follows (Even if that report does not fall inside 'alert location' radius)
  5. Reports which other user created and fall under his/her 'current location' radius when a user is travelling.
  6. Follow notification- When someone started following him/her
  7. Message notification- When someone sent a message to him/her

Note:- Once a report appear in user's timeline then it will not disappear even if user move out of location radius or changes 'alert location'. It will only disappear when it is deleted or flagged as inappropriate.

Requirement: I need an optimal DB schema(with SELECT query) so that I can show above mentioned posts on user's timeline.

Here is my current DB structure:

Table: users
   UserID (PK)
   email
   password

Table: alert_locations 
   alertLocationID (PK) 
   user_id (FK) 
   latitude 
   longitude 
   radius

Table: followers 
   follower_id (FK) 
   following_id (FK)(follower_id + following_id) = PK

Table: reports 
   reportID (PK) 
   reported_by_id (FK) 
   location 
   latitude 
   longitude 
   parent_id (FK- For retweet relations)

Now, taking into the account the above cases of timeline I am thinking of timeline table with structure shown below:

Table Name: timeline
          timelineID  (PK)
          reference_id  (Can be report_id or follower_id or message_sender_id)
          user_id   (FK) 
          title
          type       (1:For Report, 2:Follow, 3:Message)

So, when a report is created, for each user a new row in timeline table is inserted who is eligible to view that report. Using this table I can query all types of posts for user's timeline.

But as I see, this approach seems to have scaling issues and it does not look wise to insert 'n' amount of rows in timeline table every time a new report is created.

Is there any other better solution to achieve this?

Best Answer

Details not yet spelled out for the schema:

PKs -- What datatype? INT UNSIGNED (4 bytes) tops out at 4 billion; think about whether you need to go above that or can shrink the size.

Collations -- For inclusion of Chinese and Emoji, be sure to use utf8mb4.

Password -- You should probably not store the plain-text password, but rather the encrypted version.

Latitude & Longitude -- How much precision is needed? What queries will be applied. This area can lead to some terribly inefficient code. Your choices today impact your ability to even use lat/lng without a major rewrite, especially for your item 5. http://mysql.rjweb.org/doc.php/latlng#representation_choices and http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

Table: followers -- Does "x follows y" imply that "y follows x"? The answer to this has an impact all the way back to the table definition.

"reference_id (Can be report_id or follower_id or message_sender_id)" -- This is not easy to implement. You may have to re-think the requirements. Ponder whether to do it as 3 tables that are UNIONed together.

If you are talking about "massive scale", you will need to allow for "sharding". I recommend postponing that until later -- but plan for a major rewrite in the near future.

If things will be "deleted", will they be "soft" or "hard" deletes or some form of "archiving". If the quantity is small, this question does not matter; if it is large, it impacts performance and may resort to Partitioning, which is a big schema change.

For performance, consider turning off FKs for production, but keeping them in development. Be sure to keep the indexes that are implicitly created for FKs.

Etc.