How to build a table for a private messaging system that supports replies

database-designdatabase-recommendationhierarchy

I'm writing a private messaging web application in PHP, the application allows replies such that when you view a message, you also get to see to what was that a reply to, and to what was that a reply to and to what was that a reply to and so on and so on.

I'm trying to find a right database structure that would avoid redundancy, so I'm wondering how can I link a single message to all other messages that it is replying to?

I thought about basically having a field called reply_id which would be a serialized array holding the IDs of all messages to be presented as replies

Does anyone have a suggestion on how to do this efficiently? Is my thought a good practice?

Best Answer

Hierarchic Self-Referential Data

Syntax shown using the free and open source database, PostgreSQL

You need a self-referential table, which creates hierarchy in the database. This is how it looks.

CREATE TABLE messages (
  ts_entered  timestamp DEFAULT now(),
  reply_to    int       REFERENCES messages,
  id          int       PRIMARY KEY
                        GENERATED BY DEFAULT AS IDENTITY,
  content     text
);
CREATE INDEX ON messages (reply_to, id);

Notice that reply_id reference an id on the very table. In a more enterprisey example, we'd probably make it EXCLUDE the same id.

Then we enter test data like this.

INSERT INTO messages ( reply_to, id, content )
VALUES
  ( null, 1, 'SYN'),
  ( 1,    2, 'SYN-ACK'),
  ( 2,    3, 'ACK'),
  ( null, 4, 'We should give free advertising to PLAN EXPLORER'),
  ( 4,    5, 'Admins should disclose their affiliations with products'),
  ( 5,    6, 'BANNNN'),
  ( 4,    7, 'Plan Explorer only works on one database'),
  ( 7,    8, 'BANNNN'),
  ( 4,    9, 'Plan Explorer does not support Linux'),
  ( 9,   10, 'BANNNN'),
  ( 4,   11, 'Plan Explorer only does what decent databases already do'),
  ( 11,  12, 'BANNNN')
;

Now in order to query this, we need a RECURSIVE CTE.

WITH RECURSIVE t(reply_to, id, content, root, level)
AS (
  SELECT reply_to, id, content, ARRAY[id], 0
  FROM messages
  WHERE reply_to IS NULL
  UNION ALL
    SELECT messages.reply_to, messages.id, messages.content, root + ARRAY[messages.id], t.level+1
    FROM t
    JOIN messages
      ON (messages.reply_to = t.id)
)
SELECT * FROM t
ORDER BY root;

And you're done..

 reply_to | id |                         content                          |   root    | level 
----------+----+----------------------------------------------------------+-----------+-------
          |  1 | SYN                                                      | {1}       |     0
        1 |  2 | SYN-ACK                                                  | {1,2}     |     1
        2 |  3 | ACK                                                      | {1,2,3}   |     2
          |  4 | We should give free advertising to PLAN EXPLORER         | {4}       |     0
        4 |  5 | Admins should disclose their affiliations with products  | {4,5}     |     1
        5 |  6 | BANNNN                                                   | {4,5,6}   |     2
        4 |  7 | Plan Explorer only works on one database                 | {4,7}     |     1
        7 |  8 | BANNNN                                                   | {4,7,8}   |     2
        4 |  9 | Plan Explorer does not support Linux                     | {4,9}     |     1
        9 | 10 | BANNNN                                                   | {4,9,10}  |     2
        4 | 11 | Plan Explorer only does what decent databases already do | {4,11}    |     1
       11 | 12 | BANNNN                                                   | {4,11,12} |     2
(12 rows)

Related Solutions

Priority Inbox Database Design

I think the trick to this is that it doesn't have to be real time, just eventually consistent, in which case it's straightforward enough (using SQL Server, but this applies in any DB). First a trivial table and some sample data:

create table messages 
(message_id integer, sender varchar(20), recipient varchar (20))
go

insert into messages values (1, 'Gaius', 'Octavian')
insert into messages values (2, 'Gaius', 'Octavian')
insert into messages values (3, 'Gaius', 'Octavian')
insert into messages values (4, 'Aurelius', 'Octavian')
insert into messages values (5, 'Aurelius', 'Octavian')
insert into messages values (6, 'Aurelius', 'Gaius')
insert into messages values (7, 'Aurelius', 'Gaius')
insert into messages values (8, 'Octavian', 'Gaius')
go

This is logging for every message, who sent it and who to (assuming for simplicity that the message body is stored in another table). So we can see that the top sender to Octavian is Gaius (3 messages of 5), and the top sender to Gaius is Aurelius (2 messages of 3). To query that using a CTE:

with q1 as (
  select recipient, sender, count(sender) as num_messages_from_sender,
  rank() over (partition by recipient order by count(sender) desc) as priority
  from messages group by recipient, sender)
select recipient, sender as top_sender, num_messages_from_sender 
from q1 where priority=1
go

In practice you would have a job that ran every minute (or whatever interval is best) refreshing a lookup table mapping a user to their top sender (or top n senders using where priority <= n) (or in your case, you would be tracking the senders to which they reply with another column and filtering by that).

For the sake of simplicity I have left off indexes and partitioning - they would be the key to performance of this solution. You could certainly scale this to many billions of messages on any modern DB/hardware. GMail most likely has a custom solution tho', but with 20,000 engineers Google can do that!

Mysql – What relationship type to use for this messaging system

Assuming you're not trying to keep track of threads of messages, I.E. you send a message to N people, and they can't reply:

1) user needs no references to other tables
2) a message should have an ID (obviously) and a reference to the user who sent it originally
3) you need a table that connects a single message and its recipients (users) that looks like this: message_recipients{user_id, message_id} where the primary key is the combination of the two fields. This ensures that one message can have many recipients, and that users can be the recipients of many different messages.

One user can send a message to one or many users

This is accomplished via the message_recipients table

Messages can be sent only to registered users (i.e. to an existing 'idUser' record)

You'll need to handle this in your application logic as well, but if you add (as you should) a foreign key and (depending on your DBMS) a NOT NULL restriction on each of the columns in message_recipients table, it will be possible to create a record in message_recipients without the recipient existing.

There should be tracking of sent messages (who sent the message and to whom)

This data model supports that. Example:

Find the name of all recipients of messages sent by user Mike Johnson

(edited to give better aliases to the two different uses of the user table)

SELECT recipient.firstname, recipient.lastname
FROM user recipient
JOIN message_recipient mr ON recipient.id = mr.user_id
JOIN message m ON m.id = mr.message_id
JOIN user sender ON m.user_id = sender.id
WHERE sender.firstname = 'Mike' and sender.lastname = 'Johnson'