MySQL – Randomly Slow Inserts with InnoDB

amazon-rdsinnodbMySQLperformance

TL;DR : How can I improve insertion time in my table so that every inserts take under 100ms to perform?

This table is used by a webservice which receives transactions one by one so I can't group inserts. The timeout of the webservice must be under 150ms.

  • Infrastructure : AWS
  • Application (php) : EC2 t2.micro
  • Mysql : RDS t2.small – 100GB magnetic storage

Here are the measurments I've done in production

datadog

Top left hand corner : We can see that a lot of inserts take more than 500ms and some even take more than 1s. Only a few are under 50ms. (red bars are 95th percentile)

Top right hand corner : RDS cpu is always low (memory usage is also constant and fine)

Bottom left hand corner : Number of inserts in the table

Bottom right hand corner : blue line is write IOPS, red line write lentency, blue bars is disk_queue_depth * 100. Disk queue is pretty much always below 1. We can see huge spikes in write lentency when everything else looks fine : no increase in IOPS nor in hardware ressource consumption.

I've created a script to measure insertion time in account_transaction in isolation :

// pseudo code
for ($i=0; $i<100; $i++) {
    $ts=microtime(true);
    $db->insert(data);
    echo microtime(true)-$ts;
}

Output for 100 inserts : (sorted)

0.012468814849854
0.012633085250854
0.012740135192871
0.012858867645264
0.012872934341431
...
...
...
0.038048028945923
0.040112018585205
0.040735960006714
0.042028903961182
0.42462396621704
Avg : 0.025037434101105

I pretty much always end up with the same results : 1% or 2% of the inserts are very slow. Same if i perform 1000 inserts.


What I've tried

  • disable all indexes except PK : no improvements.
  • rebuild all indexes : no improvements.
  • innodb_flush_log_at_trx_commit=0 : great improvements :
    • slow inserts totally disappeared
    • average insert time 5 time faster
  • switch to a bigger RDS (m4.large with 100GB SSD with 1000 IOPS preprovisioed) : some improvements :
    • slow inserts almost disappeared : maybe 0.1% if any and it rarely exceeds 100ms
    • average insert time 3 time faster

Unfortunately innodb_flush_log_at_trx_commit=0 isn't an option as I can't afford to lose any transaction.

My thoughts

I would expect inserts to be pretty fast most of the time (<1ms) even on basic hardware. Table only has 4 indexes and 200k rows. Am I wrong to assume that?

I suspect the bottleneck to be in the I/O realm because of the test with innodb_flush_log_at_trx_commit=0 but the metrics I've got from AWS doesn't seem to confirm that. Am I wrong again?

Table account_transaction

CREATE TABLE `account_transaction` (
  `id` char(21) COLLATE utf8_unicode_ci NOT NULL,
  `seqid` int(9) unsigned NOT NULL AUTO_INCREMENT,
  `account_id` char(21) COLLATE utf8_unicode_ci NOT NULL,
  `token` bigint(20) unsigned DEFAULT NULL,
  ...
  ...
  `creation_date` datetime NOT NULL,
  `processing_date` datetime DEFAULT NULL,
  PRIMARY KEY (`seqid`),
  UNIQUE KEY `id` (`id`),
  KEY `consolidation_idx` (`trans_link`,`token`),
  KEY `idx_token` (`token`)
) ENGINE=InnoDB AUTO_INCREMENT=239304 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

id look like 1473430240HBCRSPL9FPB = unix epoch + 10 random chars : we use this format vs a standard UUID to preserve good insert performance. Always 21 char.

seqid is used to keep track of the row number. Our id has a random part so we can't use it for that purpose.

select count(1) from account_transaction = 239k

Questions

  1. Is there a way to remove the high lentency spikes?
  2. Should i go with bigger hardware to improve performance? My traffic is really low…
  3. I must ensure that 99.9% of inserts are done under 100ms. Is that realistic on RDS mysql?

Best Answer

  • I would suggest that you are 12ms away (latency) from the AWS datacenter. You are stuck with that overhead. Conclusion: Beware of the timings.

  • The "<1ms" is realistic only if client and server are in the same datacenter and the disk is SSD. What is "magnetic storage"? Perhaps that is an old, spinning, disk. In that case 10ms is a reasonable guess for one write. And it is hard to determine from the available numbers what the latency really is. (Presumably less than 12ms.) Conclusion: Be sure to use SSDs.

  • What does Amazon say about interference from other instances on the same hardware? You have a "small" instance; there are probably many others sharing the same I/O. If one of them has a spike, you might be slowed down. With "large", you might have fewer competing customers. And it might be SSD. Conclusion: Focus on average or 95th percentile, not spikes.

  • innodb_flush_log_at_trx_commit=1 requires at least 1 disk write per transaction. =0 delays and combines writes, hence the noticeable speedup. One way to soften it is to batch multiple queries into a single BEGIN...COMMIT. (Only do such where the logic makes sense.) Conclusion: Bigger transactions. (However, you will still be stuck with 12+ms per transaction instead of per insert.)

  • Updates to indexes (except for UNIQUE indexes) are delayed and can usually be treated as zero impact.

  • char(21) COLLATE utf8_unicode_ci -- Is it always 21 characters? If not, use VARCHAR. Perhaps it is always ascii, not utf8? If so, use CHARACTER SET ascii.

  • PRIMARY KEY (seqid), UNIQUE KEYid(id) -- Why have both? As already mentioned, it causes overhead included in the INSERT time, namely to verify uniqueness before committing.

  • The activity seems to be very low. Normally people ask questions like yours when they are having trouble with hundreds or thousands of queries per second.

  • Here's a way to get a 16-byte, chronologically ordered, UUID: http://mysql.rjweb.org/doc.php/uuid . (But not as easily displayed.)