I have 15 Amazon AWS EC2 t1.micro instances which simultaneously populate Amazon RDS MySQL d2.m2.xlarge database with data using large INSERTs (40000 rows in query).
The queries are sent continuously. The table is INNODB, two INT columns, there is index for both columns. CPU utilization of RDS instance is about 30% during data receiving.
When I have one EC2 instance, the speed is in orders of magnitude faster than I run 15 instances simultaneously. In light of this, the 15-instances group works slower and slower until the speed becomes totally unsatisfactory.
How can I optimize performance for this process?
UPDATE
My SHOW CREATE TABLE
result is the following:
CREATE TABLE `UserData` (
`uid` int(11) NOT NULL,
`data` int(11) NOT NULL,
PRIMARY KEY (`uid`,`data`),
KEY `uid` (`uid`),
KEY `data` (`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I need 2 indexes because it is necessary for me to fetch data by uid
and by data
value.
I insert data with INSERT INTO UserData (uid, data) VALUES (1,2),(1,3),(1,10),...
with 40000 (uid,data)
pairs.
15 parallel instances insert ~121 000 000 rows in 2 hours, but I am sure that it can be much more faster.
Best Answer
Two hints for you:
The
KEY uid
is redundant, because it is covered by thePRIMARY KEY
40,000
rows at a time might make for too large a transaction. Although these are very small rows (twoINT
s) this may cause the transaction to go to disk, depending on your settings. I usually go with around1,000
rows at a time (I go as low as100
and as high as10,000
). Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000