I think the trick to this is that it doesn't have to be real time, just eventually consistent, in which case it's straightforward enough (using SQL Server, but this applies in any DB). First a trivial table and some sample data:
create table messages
(message_id integer, sender varchar(20), recipient varchar (20))
go
insert into messages values (1, 'Gaius', 'Octavian')
insert into messages values (2, 'Gaius', 'Octavian')
insert into messages values (3, 'Gaius', 'Octavian')
insert into messages values (4, 'Aurelius', 'Octavian')
insert into messages values (5, 'Aurelius', 'Octavian')
insert into messages values (6, 'Aurelius', 'Gaius')
insert into messages values (7, 'Aurelius', 'Gaius')
insert into messages values (8, 'Octavian', 'Gaius')
go
This is logging for every message, who sent it and who to (assuming for simplicity that the message body is stored in another table). So we can see that the top sender to Octavian is Gaius (3 messages of 5), and the top sender to Gaius is Aurelius (2 messages of 3). To query that using a CTE:
with q1 as (
select recipient, sender, count(sender) as num_messages_from_sender,
rank() over (partition by recipient order by count(sender) desc) as priority
from messages group by recipient, sender)
select recipient, sender as top_sender, num_messages_from_sender
from q1 where priority=1
go
In practice you would have a job that ran every minute (or whatever interval is best) refreshing a lookup table mapping a user to their top sender (or top n senders using where priority <= n
) (or in your case, you would be tracking the senders to which they reply with another column and filtering by that).
For the sake of simplicity I have left off indexes and partitioning - they would be the key to performance of this solution. You could certainly scale this to many billions of messages on any modern DB/hardware. GMail most likely has a custom solution tho', but with 20,000 engineers Google can do that!
Assuming you're not trying to keep track of threads of messages, I.E. you send a message to N people, and they can't reply:
1) user needs no references to other tables
2) a message should have an ID (obviously) and a reference to the user who sent it originally
3) you need a table that connects a single message and its recipients (users) that looks like this: message_recipients{user_id, message_id} where the primary key is the combination of the two fields. This ensures that one message can have many recipients, and that users can be the recipients of many different messages.
One user can send a message to one or many users
This is accomplished via the message_recipients table
Messages can be sent only to registered users (i.e. to an existing 'idUser' record)
You'll need to handle this in your application logic as well, but if you add (as you should) a foreign key and (depending on your DBMS) a NOT NULL restriction on each of the columns in message_recipients table, it will be possible to create a record in message_recipients without the recipient existing.
There should be tracking of sent messages (who sent the message and to whom)
This data model supports that. Example:
Find the name of all recipients of messages sent by user Mike Johnson
(edited to give better aliases to the two different uses of the user table)
SELECT recipient.firstname, recipient.lastname
FROM user recipient
JOIN message_recipient mr ON recipient.id = mr.user_id
JOIN message m ON m.id = mr.message_id
JOIN user sender ON m.user_id = sender.id
WHERE sender.firstname = 'Mike' and sender.lastname = 'Johnson'
Best Answer
Hierarchic Self-Referential Data
Syntax shown using the free and open source database, PostgreSQL
You need a self-referential table, which creates hierarchy in the database. This is how it looks.
Notice that
reply_id
reference an id on the very table. In a more enterprisey example, we'd probably make itEXCLUDE
the sameid
.Then we enter test data like this.
Now in order to query this, we need a
RECURSIVE CTE
.And you're done..