Mysql – Is a update-only-once-row table worth sharding

innodbMySQLsharding

I have a big MySQL 5.6 Inno DB table that contains user's request.

A table row updates only once, shortly-after the initial insert!

(The initial insert is to get the unique auto-incrementing request_id in order to do the processing of the request)

CREATE TABLE requests (

id bigint(20) NOT NULL AUTO_INCREMENT,

user_id BIGINT NOT NULL,

date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

request text

data text,

) ENGINE InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

psuedo script code goes like this:

execute("INSERT INTO requests (id) VALUES (default)")

my $id = query("SELECT LAST_INSERT_ID()")

my $data = do_some_fast_processing($id, $query)

execute(qq# UPDATE request SET query="$query", data=$data WHERE id=$id #)

There are roughly 10 million request per day. Meaning 10 million instance of "insert-and-then-update".
Assume I shard this into 2(odd/even user id), And assume both shards will do half of each day's request, would I gain a significant/worthy write performances? because of the lack of locking? Also assume my machine IO isn't maxed.

Best Answer

Is there a problem? Or are you expecting to grow significantly? 10M/day = 120/second, which is high, but not necessarily the limit.

innodb_flush_log_at_trx_commit = 1 is the safest, but it is the slowest. A value of 2 will give you a boost in performance.

Batching INSERTs is also a performance boost; however you may not be able to do so because of how your app works. Also combining statements into a transaction would help -- if practical.

What you have described of your app sounds like all the activity is concentrated near the 'end' of the one table. This implies that there is very little I/O (other than transaction logging, and that was what I was addressing above).

Back to your question...

Sharding across N servers will cut the I/O, locking, CPU, etc by a factor of nearly N.

The process that decides which shard to go to should be another machine, and the shards should be alone on each of the machines.

A problem with sharding to handle growth is what to do when you need more than N machines.