Assuming that you range partition your table by the (auto incrementing) sequence and you limit your query by another value (day number), you might see, that a lot of partitions will be touched when you use:
EXPLAIN PARTITIONS SELECT * FROM table_name WHERE day_number=X ORDER BY...
The reason for that is, that there is no linkage between the partitioning attribute and the data filter. Since MySQL uses a different index for each partition, it needs to open all partitions to find all data matching day_number.
If you know the partition name, maybe by querying INFORMATION_SCHEMA.PARTITIONS before the real select, and you use MySQL-5.6+, you can use partition selection like
SELECT * FROM table_name PARTITION(partition_name) WHERE day_number=X ORDER BY...
But this is just chitchat based on a lot of assumtions. To give you a real answer, you need to post the query plan and more information about table structure - like Phil said before.
Check this out, and links within (1, 2). InnoDB has a limit of 4 billion tables. There is no compelling reason to use the file system to store data that is better stored in a database.
Take a look here to see that virtually every form of social networking software uses an underlying database - normally MySQL or PostgreSQL (which I would recommend). 2 million rows is very small in todays database terms - no need to have 40.000 tables (1 for each user either) - although you may beed to look at some form of partitioning - take a look at the answer here.
Anyway, with 40.000 users, shouldn't you perhaps be thinking of using more than 1 machine?
1 guy here tried putting > 32.000 schemas onto a MySQL instance - and couldn't (follow links there also). With 40K users, put 20K in a separate schema on each server if you want to - see here also. Implicit assumption - not all of your schemas will be very active simultaneously.
Databases are designed for searching and sorting and aggregating data - file systems aren't - are you going to grep huge files? Not that it's impossible, but your time is probably better used doing other things.
Final suggestion, download some of the Open Source stuff out there and check to see what they've done - keep what you like and throw away the rest.
[EDIT in response to OP's questions]
You asked:
The private messages aren't going to be searched i guess, is it better
to keep the file system for this job?
I would still hold them in the database - you never know when you're
going to need them - file systems aren't generally designed for lots
and lots of small files. Plus, in a file system scenario, how do you store who is replying to whom? By having yet more files? Or by putting in a header? Uggghh! A BLOB for the message with fields for replying_to_message_ID &c. - much easier!
You also asked:
As for the friendship records i will use the database, will InnoDB
make the difference regarding the maximum rows and the performance?
See the post by Bill Karwin here. Bill Karwin in training manager with Percona. Anybody with Percona is likely to know a thing or two about MySQL.
People argue about the performance of MyISAM v. InnoDB - Bill Karwin again.
MyISAM is old and will soon be deprecated. Don't use unless you have a compelling reason to do so (my guess is that you won't).
Best Answer
Do not use
ENGINE=MyISAM
, that will lead to blobs being costly for any operation on the table. Instead, use InnoDB."Vertical partitioning" is generally not needed, because of the way InnoDB effectively does it for you....
With InnoDB, there are 4
ROW_FORMATs
; let's try to pick the optimal one. In all cases, some, or all, of each blob is stored away from the row itself. More in a minute.But first, let's check the queries. Do not use
SELECT *
; always specify only the necessary columns. This avoids reaching into other blocks to get the blobs.The
ROW_FORMATs
are redundant, compact, dynamic, and compressed. For your case, I recommend eitherDYNAMIC
orCOMPRESSED
. These store none of the blob in the record itself; instead they have a 20-byte pointer to where the blob is stored. Only if you need to touch the blob does it run off to that location.COMPRESSED
is the same asDYNAMIC
, except for a complicated block-level compression technique. Most benchmarks I see say you get about 2x compression. This is not as good as you might get by compressing the blobs in the client. For example, most 'text' compresses 3x. But most image formats are already compressed, so compression is a waste.So, I leave it to you to decide between dynamic and compressed. If in doubt, test with your data and your queries.
You should have
innodb_file_per_table=1
when youCREATE
the table. You may needBARRACUDA
and you do need at least version 5.6; 5.7 would be better.Two tables, one for raw, one for combined? Maybe. Given the benefits described above, there may not be any benefit in having two table. However, if there is a lot of activity in one column, it might be better to move it to a separate table (plus 1:1 id, etc). (This is especially important for things like "like/unlike" counters.)
Since you are working with bulky things, be careful that you do not have long-running transactions. Such can lead to timeouts, etc. Still, use
BEGIN...COMMIT
where appropriate.