I'm developing a friendship relation in MySQL where friend relation is mutual. If A is friend of B, then B is friend of A. If one of the users end friendship then relation drops. I want to learn which way is better.
I have a running system;
user
-----------
userid p.k
name
friends
-------
userid
friendid
primary key (`userid`,`friendid`),
key `friendid` (`friendid`)
1 2
2 5
1 3
To get all of my friends;
SELECT u.name, f.friendid , IF(f.userid = $userid, f.friendid, f.userid) friendid
FROM friends f
inner join user u ON ( u.userid = IF(f.userid = $userid, f.friendid, f.userid))
WHERE ( f.userid = '$userid' or f.friendid = '$userid' )
This query works well. Maybe I can add a UNION
. The Query is more complicated than the one below and the table contains half as many records as the one below.
Another way is to keep relations in separate rows;
1 2
2 1
2 5
5 2
1 3
3 1
SELECT u.name, f.friendid
FROM friends f inner join user u ON ( u.userid = f.friendid )
WHERE f.userid = '$userid'
This query is simple, although the table takes twice as much space.
My concern is; assuming that there are millions of users; which way will work faster?
What are the advantages and disadvantages of both ways?
What should I keep in mind or change for these ways? And what problems can I face for both ways?
Best Answer
The first thing that catches my eye is the index setup for
friends
.You have this at the moment:
When crosschecking for mutual friendship, it could incur a little expense because the userid may be retrieved from the table when traversing the
friendid
index. Perhaps you could index as follows:This might remove any need to access the table and search the index only.
Now, in terms of the queries, both of them may improve with the new unique index. Creating the unique index also eliminates the need to insert
(A,B)
and(B,A)
into the table because(A,B)
and(B,A)
would be the index anyway. Thus, the second query would not have to gore through the table to see if someone is a friend of someone else because another person initiated the friendship. That way, if the friendship is broken by just one person, there are no orphaned friendships that are one-sided (seems a lot like life these days, doesn't it?)Your first query looks like it would benefit more from the unique index. Even with millions of rows, locating friends using the indexes only would avoid touching the table. Still, since you did not present a UNION query, I would like to recommend a UNION query:
This will let you see who are the friends of each userid
To see all friendships, run this:
First, here is some sample data:
Let's look at all the relationships
Let's look at all 5 userid's and see if the relationships are correctly shown
They all look correct to me.
Now, let's use your second query to see if it matches up...
Why don't match up? That's because I did not load the
(B,A)
for every(A,B)
. Let me load the(B,A)
relationships and try your second query again.They still don't match. That's because your second query is only checking one side.
Let's check your first query against every value with only (A,B) and not (B,A):
Your first works well. I am sure it is benefiting from the unique index as I said earlier, but IMHO I think the UNION is simpler. With than unique index, it would appear to be six of one and half dozen of the other in terms of execution and output.
You would have to benchmark your first query against my suggestion UNION and see.
This was a good question you asked today. +1 for your question.