I have a big MySQL 5.6 Inno DB table that contains user's request.
A table row updates only once, shortly-after the initial insert!
(The initial insert is to get the unique auto-incrementing request_id in order to do the processing of the request)
CREATE TABLE
requests
(
id
bigint(20) NOT NULL AUTO_INCREMENT,
user_id
BIGINT NOT NULL,
date
timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
request
text
data
text,) ENGINE InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
psuedo script code goes like this:
execute("INSERT INTO requests (id) VALUES (default)")
my $id = query("SELECT LAST_INSERT_ID()")
my $data = do_some_fast_processing($id, $query)
execute(qq# UPDATE request SET query="$query", data=$data WHERE id=$id #)
There are roughly 10 million request per day. Meaning 10 million instance of "insert-and-then-update".
Assume I shard this into 2(odd/even user id), And assume both shards will do half of each day's request, would I gain a significant/worthy write performances? because of the lack of locking? Also assume my machine IO isn't maxed.
Best Answer
Is there a problem? Or are you expecting to grow significantly? 10M/day = 120/second, which is high, but not necessarily the limit.
innodb_flush_log_at_trx_commit = 1 is the safest, but it is the slowest. A value of 2 will give you a boost in performance.
Batching INSERTs is also a performance boost; however you may not be able to do so because of how your app works. Also combining statements into a transaction would help -- if practical.
What you have described of your app sounds like all the activity is concentrated near the 'end' of the one table. This implies that there is very little I/O (other than transaction logging, and that was what I was addressing above).
Back to your question...
Sharding across N servers will cut the I/O, locking, CPU, etc by a factor of nearly N.
The process that decides which shard to go to should be another machine, and the shards should be alone on each of the machines.
A problem with sharding to handle growth is what to do when you need more than N machines.