MySQL – Relational Schema Problem for Friendship and Messages with Big Data Manipulation

database-designMySQLschema

Before you start suggesting this question for duplicate read it, I have searched and i haven't found any proper solution.

So, i'm building a social networking site (I know that you are laughing right now)
and i want to save the friendship states and the messages that users send everyday.

Now let me explain the problem more clearly, I thought that if i use a table for the friendships and have 40K users (and the average user have 50 friendships) that means that the table is going to grow to 2 million rows!

This will result to slow data processing or even to a database failure.

What if I use the file system to store the friendships instead of a Database?
I mean, i can store friendships in files.
(every user could have a "buddy.dat" file in his personal subdirectory that contains his friendships)

I have already done this thing for the messages, every user has a personal subdirectory created with a "messageges.dat" file that contains his conversations.
(This means that I'm going to have a directory with 40K personal sub-directories! 1 for every user)

Are there any consequences if I use the file system for this purpose?
(store the friendships/messages to files and then retrieve the data from there)

What is your opinion / solution?


EDIT:

I have news guys!!
I made a script to generate 40K directories in the "messages" directory and another script that generated 2 million rows in the users table!!

So, in the 40K directories case, they didn't slow down my site's scripts, it only took me half a minute to list them if i wanted with the file browser in a Debian distro.

In the mysql 2 million rows case, the 2 million rows didn't slow down my database or the site either, everything was ok.

Best Answer

Check this out, and links within (1, 2). InnoDB has a limit of 4 billion tables. There is no compelling reason to use the file system to store data that is better stored in a database.

Take a look here to see that virtually every form of social networking software uses an underlying database - normally MySQL or PostgreSQL (which I would recommend). 2 million rows is very small in todays database terms - no need to have 40.000 tables (1 for each user either) - although you may beed to look at some form of partitioning - take a look at the answer here.

Anyway, with 40.000 users, shouldn't you perhaps be thinking of using more than 1 machine? 1 guy here tried putting > 32.000 schemas onto a MySQL instance - and couldn't (follow links there also). With 40K users, put 20K in a separate schema on each server if you want to - see here also. Implicit assumption - not all of your schemas will be very active simultaneously.

Databases are designed for searching and sorting and aggregating data - file systems aren't - are you going to grep huge files? Not that it's impossible, but your time is probably better used doing other things.

Final suggestion, download some of the Open Source stuff out there and check to see what they've done - keep what you like and throw away the rest.

[EDIT in response to OP's questions]

You asked:

The private messages aren't going to be searched i guess, is it better to keep the file system for this job?

I would still hold them in the database - you never know when you're going to need them - file systems aren't generally designed for lots and lots of small files. Plus, in a file system scenario, how do you store who is replying to whom? By having yet more files? Or by putting in a header? Uggghh! A BLOB for the message with fields for replying_to_message_ID &c. - much easier!

You also asked:

As for the friendship records i will use the database, will InnoDB make the difference regarding the maximum rows and the performance?

See the post by Bill Karwin here. Bill Karwin in training manager with Percona. Anybody with Percona is likely to know a thing or two about MySQL.

People argue about the performance of MyISAM v. InnoDB - Bill Karwin again. MyISAM is old and will soon be deprecated. Don't use unless you have a compelling reason to do so (my guess is that you won't).