Mysql – Large INSERTs performance optimization

amazon-rdsinnodbinsertMySQLoptimization

I have 15 Amazon AWS EC2 t1.micro instances which simultaneously populate Amazon RDS MySQL d2.m2.xlarge database with data using large INSERTs (40000 rows in query).

The queries are sent continuously. The table is INNODB, two INT columns, there is index for both columns. CPU utilization of RDS instance is about 30% during data receiving.

When I have one EC2 instance, the speed is in orders of magnitude faster than I run 15 instances simultaneously. In light of this, the 15-instances group works slower and slower until the speed becomes totally unsatisfactory.

How can I optimize performance for this process?

UPDATE

My SHOW CREATE TABLE result is the following:

CREATE TABLE `UserData` (
 `uid` int(11) NOT NULL,
 `data` int(11) NOT NULL,
 PRIMARY KEY (`uid`,`data`),
 KEY `uid` (`uid`),
 KEY `data` (`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

I need 2 indexes because it is necessary for me to fetch data by uid and by data value.

I insert data with INSERT INTO UserData (uid, data) VALUES (1,2),(1,3),(1,10),... with 40000 (uid,data) pairs.

15 parallel instances insert ~121 000 000 rows in 2 hours, but I am sure that it can be much more faster.

Best Answer

Two hints for you:

  1. The KEY uid is redundant, because it is covered by the PRIMARY KEY

  2. 40,000 rows at a time might make for too large a transaction. Although these are very small rows (two INTs) this may cause the transaction to go to disk, depending on your settings. I usually go with around 1,000 rows at a time (I go as low as 100 and as high as 10,000). Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000