Mysql – Innodb: after 48 hours of optimizing 10mb/sec write speed

innodbmariadbMySQL

I run this on a 8 core server with NVME storage (2TB) and 64 GB ram.
The disks are hell fast, 1.1 GB/sec seq and 70-100k IOPS full duplex.

Because I had so horrible performance with mysql 5.7 I installed Mariadb 10.3.8 on a slim docker container.

In total I have tables to write that are 2 TB and a billion rows in size. But let me make clear: this speed performance happens on empty disk at the first few thousand rows, it's not related to a large table.

I invested roughly 50 hours of work into this in the past week, day and night, I read every documentation page I could find and hundreds of guides and questions on various platforms.
I tested it all out, in almost any combination you could think off.
I tried it pure memory buffered, pure disk buffered, with and without large logs, log buffers, various flushing methods, no flushing, all those settings you can think off.

I tested importing using:
mydumper, mysql console, mysqlimport, load data infile, PHP inserts, multithreaded PDO scripts I wrote.

I tested tables with and without index, only primary indexed.

I tried importing with and without TRANSACTIONS, tried single row and multi row INSERTs.

I tried different table types, usually 20-30 columns containing mostly varchars and a few datetimes.

Performance in single thread is 3-5k rows/second and multithreaded (ridiculous..) 10-25k/second.
The CPU and DISK are mostly idle all the time, iostat shows 3-20mb/sec write performance, usually around 7mb-12mb. Depending on which settings I try.

So about 100 times slower than it should perform, there is nothing obvious holding it back.
That's the current configuration:

innodb_buffer_pool_size = 14G
innodb_buffer_pool_chunk_size=1G
innodb_log_buffer_size  = 32M
innodb_file_per_table   = 1
innodb_open_files       = 600
#innodb_flush_method    = O_DIRECT
innodb_flush_method     = O_DSYNC 
innodb_log_file_size    = 512M
innodb_io_capacity=800 
innodb_io_capacity_max=3000 
innodb_flush_neighbors=0
innodb_write_io_threads=8 
innodb_read_io_threads=8 
innodb_change_buffer_max_size=70
innodb_doublewrite=0 # corruption risk

Imagine almost every combination virtually possible, I tried it all.

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.50         0.00        19.00          0         76
xvdg            102.50     13120.00         0.00      52480          0
xvdh           1381.50     19708.00     30984.00      78832     123936
xvdf              0.00         0.00         0.00          0          0
nvme0n1         222.00         0.00     10957.00          0      43828

The only relevant disk here is nvme0n1, you can see the current write performance using the multithreaded insert.

| InnoDB |      | 
=====================================
2018-07-22 06:42:31 0x7fe7341c9700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 39 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 4883 srv_active, 0 srv_shutdown, 1061 srv_idle
srv_master_thread log flush and writes: 5944
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2215493
OS WAIT ARRAY INFO: signal count 3968391
RW-shared spins 0, rounds 6388873, OS waits 1674234
RW-excl spins 0, rounds 34932124, OS waits 431565
RW-sx spins 13782, rounds 169207, OS waits 2879
Spin rounds per wait: 6388873.00 RW-shared, 34932124.00 RW-excl, 12.28 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 1036891
Purge done for trx's n:o < 1036891 undo n:o < 0 state: running but idle
History list length 53
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 422106005755512, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 422106005754696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (read thread)
I/O thread 7 state: waiting for completed aio requests (read thread)
I/O thread 8 state: waiting for completed aio requests (read thread)
I/O thread 9 state: waiting for completed aio requests (read thread)
I/O thread 10 state: waiting for completed aio requests (write thread)
I/O thread 11 state: waiting for completed aio requests (write thread)
I/O thread 12 state: waiting for completed aio requests (write thread)
I/O thread 13 state: waiting for completed aio requests (write thread)
I/O thread 14 state: waiting for completed aio requests (write thread)
I/O thread 15 state: waiting for completed aio requests (write thread)
I/O thread 16 state: waiting for completed aio requests (write thread)
I/O thread 17 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0, 0, 0, 0, 0] , aio writes: [0, 0, 0, 0, 0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
263597 OS file reads, 2045302 OS file writes, 161983 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 282.56 writes/s, 28.08 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1134, seg size 1136, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 4425293, node heap has 1 buffer(s)
Hash table size 4425293, node heap has 4113 buffer(s)
Hash table size 4425293, node heap has 1 buffer(s)
Hash table size 4425293, node heap has 41738 buffer(s)
Hash table size 4425293, node heap has 1 buffer(s)
Hash table size 4425293, node heap has 9223 buffer(s)
Hash table size 4425293, node heap has 1 buffer(s)
Hash table size 4425293, node heap has 5900 buffer(s)
7968.05 hash searches/s, 13692.47 non-hash searches/s
---
LOG
---
Log sequence number 185044128669
Log flushed up to   185044077028
Pages flushed up to 185040571130
Last checkpoint at  185017989085
0 pending log flushes, 0 pending chkp writes
37148 log i/o's done, 6.79 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 17649631232
Dictionary memory allocated 419728
Buffer pool size   1048576
Free buffers       8032
Database pages     979545
Old database pages 361426
Modified db pages  275
Percent of dirty pages(LRU & free pages): 0.028
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 817, not young 38547
0.00 youngs/s, 0.00 non-youngs/s
Pages read 263539, created 1493647, written 2008209
0.00 reads/s, 204.33 creates/s, 275.76 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 979545, unzip_LRU len: 0
I/O sum[116336]:cur[0], unzip sum[0]:cur[0]
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size   131072
Free buffers       1025
Database pages     122400
Old database pages 45162
Modified db pages  19
Percent of dirty pages(LRU & free pages): 0.015
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 70, not young 6961
0.00 youngs/s, 0.00 non-youngs/s
Pages read 33027, created 186873, written 272369
0.00 reads/s, 19.72 creates/s, 32.72 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122400, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 1
Buffer pool size   131072
Free buffers       1007
Database pages     122444
Old database pages 45179
Modified db pages  26
Percent of dirty pages(LRU & free pages): 0.021
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 74, not young 9194
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32952, created 187093, written 242787
0.00 reads/s, 22.49 creates/s, 28.87 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122444, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 2
Buffer pool size   131072
Free buffers       1023
Database pages     122421
Old database pages 45170
Modified db pages  14
Percent of dirty pages(LRU & free pages): 0.011
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 45, not young 177
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32936, created 186494, written 236773
0.00 reads/s, 23.97 creates/s, 30.69 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122421, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 3
Buffer pool size   131072
Free buffers       994
Database pages     122454
Old database pages 45182
Modified db pages  39
Percent of dirty pages(LRU & free pages): 0.032
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 173, not young 254
0.00 youngs/s, 0.00 non-youngs/s
Pages read 33012, created 185887, written 258491
0.00 reads/s, 26.49 creates/s, 39.49 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122454, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 4
Buffer pool size   131072
Free buffers       1006
Database pages     122449
Old database pages 45180
Modified db pages  46
Percent of dirty pages(LRU & free pages): 0.037
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 120, not young 12679
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32937, created 187265, written 249442
0.00 reads/s, 29.79 creates/s, 39.18 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122449, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 5
Buffer pool size   131072
Free buffers       1012
Database pages     122452
Old database pages 45182
Modified db pages  25
Percent of dirty pages(LRU & free pages): 0.020
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 202, not young 5093
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32867, created 187025, written 253352
0.00 reads/s, 30.69 creates/s, 40.20 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122452, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 6
Buffer pool size   131072
Free buffers       1021
Database pages     122423
Old database pages 45171
Modified db pages  13
Percent of dirty pages(LRU & free pages): 0.011
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 64, not young 2573
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32909, created 187403, written 243809
0.00 reads/s, 25.82 creates/s, 31.64 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122423, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 7
Buffer pool size   131072
Free buffers       944
Database pages     122502
Old database pages 45200
Modified db pages  93
Percent of dirty pages(LRU & free pages): 0.075
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 69, not young 1616
0.00 youngs/s, 0.00 non-youngs/s
Pages read 32899, created 185607, written 251186
0.00 reads/s, 25.36 creates/s, 32.97 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 122502, unzip_LRU len: 0
I/O sum[14542]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=21173, Main thread ID=140612953028352, state: sleeping
Number of rows inserted 68789564, updated 0, deleted 0, read 0
10659.42 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT

Mysql acts as if it had an internal IO limit of around 25MB write/second.

I know I did not add benchmarks, I've been up for 20 hours in a row now and don't have the results at hand. Please just believe me the disks are extremely fast.
The memory allocated to mysql doesn't play a role, I went from 1 GB to 50.. no difference.

I've spent half a week and 16 hour shifts to get this running, I am at the end of my wits.
The last thing I can think about is to buy a comercial database like Oracle but that's another nightmare to go through.

Only one time I had acceptable speed:
When importing an IBD file "RAW" then using IMPORT TABLESPACE.
But that requires many hours of LOCK on the source database to get a binary snapshot, then copying it over (takes it's time over network) and then importing it all over again.
The IMPORT TABLESPACE performance itself was OK, about 600 MB/sec
Overall this was the fastest but it's unuseable for my cause.

Here the table:

    CREATE TABLE `eval` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `intern_id` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `first_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  `last_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  `middle_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  `location` varchar(196) COLLATE utf8_bin DEFAULT NULL,
  `i` varchar(128) COLLATE utf8_bin DEFAULT NULL,
  `e` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `country_code` varchar(4) COLLATE utf8_bin DEFAULT NULL,
  `country_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  `state_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  `city_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  `education` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `num_c` smallint(6) DEFAULT NULL,
  `num_j` smallint(6) DEFAULT NULL,
  `j_t` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `c_name` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `e_a` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `flag_existent` tinyint(4) DEFAULT NULL COMMENT '1/0',
  `public_p_u` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `c_intern_id` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `unmatched_facts` varchar(2048) COLLATE utf8_bin DEFAULT NULL,
  `dt_snapshot` datetime DEFAULT NULL,
  `change_small` tinyint(4) DEFAULT NULL,
  `change_significant` tinyint(4) DEFAULT NULL,
  `j_t_auth` varchar(256) COLLATE utf8_bin DEFAULT NULL COMMENT 'sure about it',
  `c_name_auth` varchar(256) COLLATE utf8_bin DEFAULT NULL COMMENT 'sure about it',
  `c_intern_id_guess` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `ut_created` int(11) DEFAULT NULL,
  `reserve_int_2` int(11) DEFAULT NULL,
  `reserve_vc1` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `reserve_vc2` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  `reserve_vc_3` varchar(256) COLLATE utf8_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `intern_id` (`intern_id`),
  KEY `location` (`location`),
  KEY `country_name` (`country_name`),
  KEY `country_name_location_notnull` (`country_name`(1),`location`(1)),
  KEY `location_country_name` (`location`,`country_name`(1)),
  KEY `location_null_country_name` (`location`(1),`country_name`),
  KEY `dt_snapshot` (`dt_snapshot`),
  KEY `state_name` (`state_name`),
  KEY `city_name` (`city_name`),
  KEY `c_name` (`c_name`),
  KEY `c_name_auth` (`c_name_auth`),
  KEY `j_t` (`j_t`)
) ENGINE=InnoDB AUTO_INCREMENT=19883676 DEFAULT CHARSET=utf8 COLLATE=utf8_bin |

I tested importing without indexes, only primary key and full indexes.
Basically the speed is always the same.
The only difference is that the disk IO speed increases with more indexes, but the rows/second stay the same.

Update
Importing the table using "IMPORT TABLESPACE" worked fast (horrible method, requires deleting IBD files and any interruption will corrupt the table plus I have to lock the primary source database for an hour)
This method allowed about 350k rows/second.

Now I was playing with the loaded database on the server, using simple SELECTS that need a fulltable scan. (count(*) where xxx is not null)
The database does the fulltable scan at 100mb/second only !
There is a bottleneck that restricts 90% of the speed possible.

Update:
I tried to come around the QUERY performance bottleneck by making 5 database sessions, doing a SELECT query on the same database table but separating the queries by the ID. 1-10000000, 1000000-2000000, 300000-4000000 etc.
Each single session increased the disk load by 100mb/sec.
So the database was 5 times faster using 5 concurrent select queries/sessions than with one query.
But actually this should be much SLOWER. It means a lot of random IO is required, less sequential IO possible as the 5 threads access the same file at different positions rapidly.

I had similar problems with WRITE, writing 5 times to the SAME databasefile was 5 times faster than writing 1 time to it, however it saturated at a very slow speed (1-5% of topspeed)

1 Select at the table:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.21    0.00    0.35   10.66    0.26   87.52

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.67         0.00        21.33          0         64
xvdg              0.00         0.00         0.00          0          0
xvdh            134.33         5.33       721.33         16       2164
xvdf              4.67         0.00        25.33          0         76
nvme0n1        7032.00    112512.00         0.00     337536          0

5 Select on the SAME table at different primary key positions:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.98    0.00    0.63   43.35    0.77   53.28
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.67         0.00        22.67          0         68
    xvdg              0.00         0.00         0.00          0          0
    xvdh            111.33        13.33       598.67         40       1796
    xvdf              0.00         0.00         0.00          0          0
    nvme0n1       30051.33    480821.33         0.00    1442464          0

Latest conclusion
It seems this issue is a flaw inside MySQL as well as MariaDB and probably caused by single threaded design.
It seems that with increasing number of columns the max performance is reduced, each column causes some delay overhead.
InnoDb/XtraDb seems not to be the issue.
That's currently the only explanation I could find, no solution possible except for writing multi-threaded custom code.

all global variables and show status:
https://paste.ee/p/Yk1Le

Here the whole config file (the current variant, I tried all I think)

[client]
port            = 3316
socket          = /maincache/share/mysqld.sock


[mysqld_safe]
socket          = /maincache/share/mysqld.sock
nice            = 0

[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /maincache/share/mysqld.sock
port            = 3316
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /instance_store/tmp
lc_messages_dir = /usr/share/mysql
lc_messages     = en_US
skip-external-locking
bind-address            = 127.0.0.1
max_connections         = 100
connect_timeout         = 5
wait_timeout            = 600
max_allowed_packet      = 160M
thread_cache_size       = 128
sort_buffer_size        = 4M
bulk_insert_buffer_size = 16M
tmp_table_size          = 16M
max_heap_table_size     = 16M
join_buffer_size=32k
sort_buffer_size=32k
myisam_recover_options = BACKUP
key_buffer_size         = 6M
table_open_cache        = 1000
table_open_cache_instances = 8
myisam_sort_buffer_size = 16M
concurrent_insert       = 2
read_buffer_size        = 2M
read_rnd_buffer_size    = 1M
query_cache_limit               = 16M
query_cache_size                = 256M
log_error = /maincache/share/cluster/mysql_error.log
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time = 10

expire_logs_days        = 10
max_binlog_size         = 100M
default_storage_engine  = InnoDB
innodb_force_recovery=1
innodb_buffer_pool_size = 10G
innodb_buffer_pool_chunk_size=512M
innodb_file_per_table   = 1
innodb_open_files       = 600
innodb_flush_method     = O_DIRECT_NO_FSYNC
innodb_log_file_size    = 512M
innodb_io_capacity=5000
innodb_io_capacity_max=80000
innodb_flush_neighbors=0
innodb_write_io_threads=64
innodb_read_io_threads=64
innodb_change_buffer_max_size=70
innodb_buffer_pool_instances=128
innodb_thread_concurrency=144

[galera]

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]

[isamchk]
key_buffer              = 16M

The server has 64GB of ram but I limited the mysql server to 10GB in this variant.
It showed absolutely no difference in performance, regardless of innodb buffer.
Server is idle, also during inserting it's 80-90% idle (IO/cpu) and of course not swapping.

Best Answer

Suggestions to consider for your my.cnf [mysqld] section based on visible information provided. Entire block to go at END of [mysqld] and REMOVE any SAME NAMED VARIABLE appearing higher in the section to avoid conflicts on requests.

innodb_io_capacity=40000  # from 5000 to open the door for NVME speed
read_rnd_buffer_size=256K  # from 1M to reduce handler_read_rnd_next RPS
innodb_lru_scan_depth=128  # from 1024 to conserve CPU every second
innodb_adaptive_max_sleep_delay=10000 # from 150000 for 1 sec sleep delay
innodb_flushing_avg_loops=4  # from 30 for reduce the loop delay
innodb_thread_concurrency=0  # from 144 see dba.stackexhange Question 5666
max_seeks_for_key=32  # to limit optimizer to nn vs ~ 4 Billion possible
max_write_locks_count=16  # to allow RD after nn lcks vs up to 4 Billion lcks
thread_concurrency=30  # from 10 for additional conc - may be DEPR
innodb_buffer_pool_instances=8  # from 64 see REFMAN for innodb_lru_scan_depth details
innodb_log_file_size=6G  # from ~ 512M to reduce log rotation
innodb_log_buffer_size=3G  # from 16M for ~ 30 minutes buffering
query_cache_type=OFF  # from ON no need to waste CPU for mgmt
query_cache_size=0  # from ~256M to conserve RAM and CPU cycles
slow_query_log=ON  # from OFF always good to have ON

RAM use strategy while you are loading 10,000+ rows per second

Total RAM = 64GB, allow mysqld up to 48GB (~ 75%)

While you are loading this high volume,

innodb_buffer_pool_size=30G  # for 62.5% of your 48G
innodb_change_buffer_max_size=50  # for 50% to have best insert rate per second

when the loading has completed,

SET GLOBAL innodb_change_buffer_max_size=15 # for 15% set aside for routine maintenance requirements;

and you will settle into typical data needed available in the innodb buffer pool in a reasonable length of uptime.

For additional Suggestions, view my profile, Network Profile for contact info, including SKYPE ID to get in touch, PLEASE. Thanks