Mysql: Schema/Query Performance Approach for aggregated mailbox folders

aggregateMySQLschema

I am about to code a messaging system where users can write messages to other users. User can create custom inbox folders for sorting the messages they receive, however, every user has 2 main inboxes:

a) with all messages they receive
b) all messages they receive filtered by users they are friends with

For all mailboxes the total/unread numbers inside shall be displayed.

This leads me to the need of the following tables:

 - users (id, username,...)
 - friends (user_id, friend_id)
 - messages (id, sender_id, receiver_id, read_status, mailbox_id, ...)
 - mailbox (id, name, owner_id, ...)

My question regarding the perfomance (thinking of a situation where there are many messages and users):

Is is better to calculate the total/unread values for each mailbox for each page view?

Select count(.) as total, 
.... ,
Group By mailbox_id, friend_status, read_status

or

Is it better to store these information in the table mailbox in extra fields like total_msg, total_unread, total_friends, total_friends_unreadand update these values upon each event (new message, read message, new friends, friends break up, etc…)?

Would there be a 3rd approach to be considered?

Thanks for help!

Best Answer

The "Best Way" is probably to do it the former way, calculating the accurate count of messages and friends, etc is going to be easier to manage on the application side. Performance will obviously depend on how many messages and friends, some variables on the database server, and indexes.

A third solution would be to use a caching layer (like memcached) to store the values that aren't going to change that much. Memcache can increment and decrement, which will work perfectly for updating your unread counts. Message counts only get updated when new messages are received, friend counts are only updated when new friends are added (and possibly with the added condition of that friend having to send a message). My point is, here, that the values here don't update that frequently, so memcached is a pretty perfect option for a solution here.

That being said, stressing about the performance of that schema early on before you have factual data on how it's being used and how it's performing definitely falls into the camp of "over optimizing." Implementing approach #2 or #3 is a trivial add once you have the system in place. Build it first, do what you believe is best, don't over optimize or over engineer. Once it's online, monitor performance, and make fixes as you identify bottlenecks.