Mysql – Which MySQL server parameters should be adjusted to improve INSERT performance for a large MyISAM table

insertmyisamMySQLperformance

I have a MyISAM table with roughly 20M rows for which inserting 1000 rows takes about 40 seconds on the production server (Ubuntu 12.04, MySQL 5.6.35). On the test machine (Windows 8.1, MySQL 5.6.22), the same insert takes less than 1 second which seems a more reasonable timing for me. On the production server inserts of ~1000 rows happens every 5 minutes. Which my.cnf configuration values should I try to change to improve performance on the production server?

Note: I've already increased key_buffer_size=256M (and even to key_buffer_size=1024M) which didn't change anything.

This is how the table definition looks like:

CREATE TABLE `mytable` (
  `id1` varchar(40) NOT NULL,
  `t` datetime NOT NULL,
  `id2` varchar(12) NOT NULL,
  `a` varchar(8) NOT NULL,
  `b` int(11) NOT NULL,
  `f1` varchar(4) DEFAULT NULL,
  `f2` varchar(5) DEFAULT NULL,
  `f3` varchar(16) DEFAULT NULL,
  `f4` varchar(16) DEFAULT NULL,
  `f5` tinyint(4) DEFAULT NULL,
  `f6` tinyint(4) DEFAULT NULL,
  `f7` tinyint(4) DEFAULT NULL,
  `f8` tinyint(4) DEFAULT NULL,
  `f9` tinyint(4) DEFAULT NULL,
  `f10` tinyint(4) DEFAULT NULL,
  `f11` tinyint(4) DEFAULT NULL,
  `f12` tinyint(4) DEFAULT NULL,
  `f13` varchar(8) DEFAULT NULL,
  `f14` int(11) DEFAULT NULL,
  UNIQUE KEY `id1` (`id1`,`t`,`id2`),
  KEY `t` (`t`),
  KEY `id2` (`id2`),
  KEY `a` (`a`),
  KEY `b` (`b`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

id1 and id2 are random values, t is a timestamp representing web user local times, so from a coarse perspective it's constantly increasing, but within a bunch of 1000 rows times may vary +/- 1 day.

Update 1

Rows will only be inserted into the table, never updated. Hence, there is no fragmentation, as expected:

mysql> select ENGINE, TABLE_SCHEMA, TABLE_NAME, DATA_LENGTH, INDEX_LENGTH, DATA_FREE from information_schema.tables;
+--------+--------------+----------------+-------------+--------------+-----------+
| ENGINE | TABLE_SCHEMA | TABLE_NAME     | DATA_LENGTH | INDEX_LENGTH | DATA_FREE |
+--------+--------------+----------------+-------------+--------------+-----------+
| ...                                                                             |
| MyISAM | products     | mytable        |  2301965084 |   2030800896 |         0 |
| ...                                                                             |
+--------+--------------+----------------+-------------+--------------+-----------+

Update 2

I've 'cloned' mytable to mytable2 for testing purposes and running test INSERTs into mytable2. In general I'm seeing better performance here: roughly 8 seconds for inserting 1000 rows (after some initial startup time). So, (1) looks like I'm encountering OS level caching issues for my production table. (2) When changing the random id1 to an increasing value, inserting 1000 rows takes roughly 400ms.

Update 3

After changing indexes to:

  UNIQUE KEY `t` (`t`,`id1`,`id2`),
  KEY `id1` (`id1`),
  KEY `id2` (`id2`)

inserting 1000 rows with random t, id1 and id2 keys, this takes roughly 500ms after some initial startup time: inserting first batch took 40000ms, second batch took 18000ms, … 10th batch took 500ms. Hence, the initial question comes up again: is there any my.cnf setting to improve this startup delay?

Best Answer

A long list of possible improvements:

Turn off updating "atime" on the filesystem?

INDEX(id1) can be DROPped since there is another index starting with that. This would decrease from 6 to 5 the number of indexes to update -- a small speedup.

Toss any other indexes you are not using.

Use NOT NULL where appropriate. I doubt if you meant for all columns to be NULLable?

With a 1G key_buffer, I hope that you have at least 6G of RAM. If not, decrease that setting.

Increasing the key_buffer will not instantly improve the index caching -- it takes time for blocks to get into the key_buffer and become useful.

If practical, "batch" the inserts. That is,

INSERT INTO mytable (...) VALUES (...), (...), ...;

Optimal is to batch 100-1000 at a time.

Switch to InnoDB.

What will you do with the data? (Usually the performance problems occur on the read side.)

Two Hardware suggestions: SSD; RAID-striping with write buffer.