Mysql: Schema/Query Performance Approach for aggregated mailbox folders

aggregateMySQLschema

I am about to code a messaging system where users can write messages to other users. User can create custom inbox folders for sorting the messages they receive, however, every user has 2 main inboxes:

a) with all messages they receive
b) all messages they receive filtered by users they are friends with

For all mailboxes the total/unread numbers inside shall be displayed.

This leads me to the need of the following tables:

 - users (id, username,...)
 - friends (user_id, friend_id)
 - messages (id, sender_id, receiver_id, read_status, mailbox_id, ...)
 - mailbox (id, name, owner_id, ...)

My question regarding the perfomance (thinking of a situation where there are many messages and users):

Is is better to calculate the total/unread values for each mailbox for each page view?

Select count(.) as total, 
.... ,
Group By mailbox_id, friend_status, read_status

Is it better to store these information in the table mailbox in extra fields like total_msg, total_unread, total_friends, total_friends_unread – and update these values upon each event (new message, read message, new friends, friends break up, etc…)?

Would there be a 3rd approach to be considered?

Thanks for help!

Best Answer

The "Best Way" is probably to do it the former way, calculating the accurate count of messages and friends, etc is going to be easier to manage on the application side. Performance will obviously depend on how many messages and friends, some variables on the database server, and indexes.

A third solution would be to use a caching layer (like memcached) to store the values that aren't going to change that much. Memcache can increment and decrement, which will work perfectly for updating your unread counts. Message counts only get updated when new messages are received, friend counts are only updated when new friends are added (and possibly with the added condition of that friend having to send a message). My point is, here, that the values here don't update that frequently, so memcached is a pretty perfect option for a solution here.

That being said, stressing about the performance of that schema early on before you have factual data on how it's being used and how it's performing definitely falls into the camp of "over optimizing." Implementing approach #2 or #3 is a trivial add once you have the system in place. Build it first, do what you believe is best, don't over optimize or over engineer. Once it's online, monitor performance, and make fixes as you identify bottlenecks.

Related Solutions

Mysql – How to get non-deleted messages for one user_id if the same user_id appears in two tables at same time

The problem is query #4 needs a small adjustment. Now the query is:

SELECT p.*
    , rcp.*
    , msg.*
    , msg.id as message_id 
FROM default_messages msg 
    LEFT JOIN default_recipient rcp ON (msg.id = rcp.message_id) 
    LEFT JOIN default_profiles p ON (p.user_id = msg.sender_user_id) 
WHERE (msg.sender_user_id = 1 AND msg.deleted = 0) 
       OR (rcp.user_id = 1 AND rcp.deleted = 0) 
ORDER BY msg.date DESC;

MySQL, “columns partitioning” when multiple columns are individiually important

You are right that mysql will check only one partition for a specific sender_id, but checks all the partitions for a specific receiver_id, as shown here:

mysql> explain partitions select * from messages where sender_id =5;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | messages | p0         | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain partitions select * from messages where receiver_id =5;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | messages | p0,p1,p3   | ALL  | NULL          | NULL | NULL    | NULL |    3 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

However, there are still benefits to that partitioning, depending on your hardware. When looking in all partitions for a receiver_id, mysql is really performing 3 select statements, one for each partition. It may be able to parallelize these select statements. Additionally, if you index receiver_id, it will be accessing 3, smaller, indicies.

In the end, you just have to do performance testing and see if it is paying off for your use case. Seeing as 100MB fits in RAM pretty easily these days, I wouldn't consider partitioning such a small table unless you have specific reasons to do so.

Best Answer

Related Solutions

Mysql – How to get non-deleted messages for one user_id if the same user_id appears in two tables at same time

MySQL, “columns partitioning” when multiple columns are individiually important

Related Question