Mysql – innodb_thread_concurrency seems evil as it enlarge table size by big factor

innodbmariadbMySQL

Environment: OS: MS Windows 2012 R2, CPU: Xeon 12 logical processors, HDD Block Size: 8 KB, DB engine (MariaDB 10.1.23 Or MariaDB 10.2.6 Or MySQL 5.7.18) with Default Settings.

Schema:

CREATE DATABASE IF NOT EXISTS `z` DEFAULT CHARACTER SET utf8;
CREATE TABLE `test` (
    `testId` BIGINT(20) NOT NULL AUTO_INCREMENT,
    PRIMARY KEY (`testId`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;

Use below mysqlslap to make 1 Million records without concurrency (single thread, single connection) :

mysqlslap -pPutPASS –create-schema=z –number-of-queries=1000000
–query="insert into test values(null)"

test.ibd size: 36.0 MB (37,748,736 bytes)

Truncate/empty the table and perform concurrency insert as below:

mysqlslap -pPutPASS –create-schema=z –concurrency=100
–number-of-queries=1000000 –query="insert into test values(null)" –commit=10000

test.ibd size: 56.0 MB (58,720,256 bytes)

Set innodb_thread_concurrency=12 and repeat step 2:

test.ibd size: 76.0 MB (79,691,776 bytes) and sometimes 72 MB. also I tested with –concurrency=500 and table size became 88.0 MB

the actual optimized size is 28 MB by using:

OPTIMIZE TABLE z.test;

Update 1:

Above test.ibd size results valid for MariaDB 10.1.23.

For MariaDB 10.2.6 test.ibd size:

36 MB (without concurrency)
88 MB (concurrency insert with innodb_thread_concurrency=0)
92 MB (innodb_thread_concurrency=12)
40 MB (optimized)

For MySQL 5.7.18 test.ibd size:

36 MB (without concurrency)
68 MB (concurrency insert with innodb_thread_concurrency=0)
72 MB (innodb_thread_concurrency=12)
40 MB (optimized)

The difference of 10.2.6 & 5.7.18 from 10.1.23 mostly due to barracuda "innodb_file_format" which seems worst than the old format as above tests shows!

Questions:

Why InnoDB table size increased by big factor when I use concurrent Insertion?
Why innodb_thread_concurrency causing the table size to be tripled and nothing mentioned in official docs regarding that?!
What are the config tweaks that make MariaDB/MySQL benefit from the concurrency performance boost WHILE keeping the table size smallest? I am planning to use innodb_thread_concurrency=0 as explained in Hyperthreading & MySQL InnoDB Thread Concurrency Performance but that's not enough, my project will have billions of records and keeping the table size as small as possible will be big gain for performance and saving of H/W resources.

Update 2:

Until I get an answer for the above questions, I will highlight my project requirements in order to get best configuration:

Production Environment:

OS: MS Windows 2012 R2

SDD: HPE 1.92TB SATA 6G Mixed, Block Size 4 KB

RAM: 768 GB

CPU: 4 Xeon Processors (Sockets) E7-8890 V3 2.5 GHZ (total cores: 72, total logical processors: 144)

Data: 150 inserts per second, 24X7, from 50 to 150 connections, single-row inserted. Avalibilty of data is 1 year (mean any record older than 1 year will be deleted), no updates on the rows, just insert once, then selecting/querying through web UI), one year data estimated to be 3-4 Billion rows. DB schema consist of one big table and 10 tables holding metadata (ex: country, etc), biggest metadata table will hold maximum of 3000 rows during the whole project life cycle. so the main focus is the big table.

So far I adjusted below config params:

max_connections=501
innodb_flush_log_at_trx_commit = 0
innodb_log_buffer_size = 256M
innodb_buffer_pool_size=512G
innodb_log_file_size=1G
max_allowed_packet = 1G
innodb_flush_method=async_unbuffered
innodb_flush_neighbors=0
innodb_io_capacity=400
innodb_io_capacity_max=2000
innodb_log_compressed_pages=off
innodb_write_io_threads = 12
innodb_doublewrite = off
innodb_read_io_threads = 12
skip_name_resolve=ON
# dump/restore buffer pool, faster buffer pool warmup
innodb_buffer_pool_dump_at_shutdown = ON
innodb_buffer_pool_load_at_startup  = ON
innodb_page_size=4k

Please Suggest (with Explanation) further config tweaks I need in order to get the best of InnoDB and MariaDB 10.2.6

Best Answer

I suspect I will get down-voted for this Answer, but I think these things need to be said. (And, no, I don't know the specific reasons for the strange sizes.)

Two simplified replies:

InnoDB plays a lot of games in its attempts to trade off speed and space.
Benchmarks rarely reflect anything in the real world.

Some factors (including possible explanations):

When there are multiple inserts into one table, InnoDB takes effort to minimize the impact of one insert on the other -- this may involve premature block splits, duplication of rows awaiting commits, MVCC, etc.
AUTO_INCREMENT is a bit tricky. Your benchmark is relatively simpleminded. In real life, there might be "burned" ids, leading to gaps in the values. This is (I assume) a deliberate efficiency measure -- gain speed, but waste ids. Note that there are various AI modes (see innodb_autoinc_lock_mode).
BTrees are inherently sloppy. When a block is "split", it becomes two half-full blocks. In the single-threaded benchmark this is not a big issue since there is no contention for the 'last' block. And InnoDB has some optimizations for such.
BTrees are even more sloppy in the presence of MVCC -- since multiple copies of a row live for some time.
BTree blocks (in InnoDB) are not necessarily recombined promptly.
Allocation in in pages, extents, segments. See here . This can lead to megabytes of un-recouped space. Furthermore, there is no 'visibility' into how much space is wasted at page/extent/segment level.
Concurrency of 100 (especially with only 12 processors) is asking for a lot of thrashing of locks in InnoDB. Oracle, even with 5.7, does not brag above about 64. I find that 100 leads to InnoDB stumbling over itself, and not necessarily finishing the benchmark any sooner than with a smaller number. That is, excessive currency can lead to terrible latency without necessarily helping throughput.
10K rows may lead to an excessive use of the undo buffer (etc), thereby actually be slower than committing sooner.
Your "project will have billions of records" -- but how fast will they be inserted, and from how many different connections, and will they be batched or single-row inserted, etc.?
My take on high speed ingestion .

More (after Update 2)

innodb_buffer_pool_instances = 16
innodb_doublewrite = off -- this may be safe since the drive and the page_size are the same.
SSDs make 150 individual inserts/sec not a problem
innodb_page_size=4k -- be aware that all tables will have to live with this. It limits rows to under 2KB. BTrees will be a little deeper, but I don't see that as a problem.
With the "delete after a year", I strongly recommend using PARTITION BY RANGE(TO_DAYS(...)) with 14 (monthly) or 55 (weekly) partitions. This will make the deletes have essentially no impact. A DELETE, even daily, would probably cause hiccups. More discussion here .
How fast did the mysqlslap run (inserts/sec)? Probably a lot faster than 150. If your 150 is reasonably steady, then you will not experience concurrency=100, possibly rarely =10. I don't see a problem ingesting your data.
If you need to 'normalize' (ex: country, etc) the data as you load it, see my ingestion link.
What are the SELECTs like? They will be competing with the INSERTs. I suspect 200 selects/sec, if they are point queries, could coexist with the inserts. If you expect 1000, we need to think further.
With partitioning, if the excessive size persists, here is a fix... REORGANIZE or REBUILD, not OPTIMIZE, a weekly partition after it is fully built. That way, the excessive disk usage can be freed. Be sure to use innodb_file_per_table before creating the table.

Related Solutions

MySQL transaction size – how big is too big

One bottleneck to be aware of is the InnoDB Log Buffer. The size is set by innodb_log_buffer_size. Here is what the MySQL Documentation says about it:

The size in bytes of the buffer that InnoDB uses to write to the log files on disk. The default value is 8MB. A large log buffer enables large transactions to run without a need to write the log to disk before the transactions commit. Thus, if you have big transactions, making the log buffer larger saves disk I/O.

The InnoDB Log Buffer should not be be confused with the InnoDB Buffer Pool. The major difference between them is their purpose. The InnoDB Log Buffer will basically record short-term changes that get written to the redo logs (ib_logfile0, ib_logfile1). The InnoDB Buffer Pool (sized by innodb_buffer_pool_size) caches data and index pages that are to be committed (if the pages are dirty) and eventually written) to disk. Once committed, the changes pages remain in RAM until removed via LRU rules.

Big transactions must funnel through the Log Buffer. As mentioned, a larger log buffer will reduce disk I/O. Only a large commit would present a bottleneck.

You may want to look into other InnoDB options to configure.

innodb_flush_method : This dictates how flush is handled and whether or not to allow the OS to cache writes to InnoDB (You must experiment between O_DSYNC and O_DIRECT)
innodb_read_io_threads (Use the Max : 64)
innodb_write_io_threads (Use the Max : 64)
innodb_io_capacity (use higher values like 5000 or 10000, but the hardware will top it out for you, In other words, let the hardware decide the IO capacity once you assign a huge number)
innodb_thread_concurrency : Default value is 0. I learned firsthand from Ronald Bradford at Percona NYC 2011 to leave this number at 0. It lets InnoDB decide the best number of concurrency tickets it could handle)

I have other posts about optimizing InnoDB for further research

Innodb – Fastest way to copy data from MyISAM to InnoDB

There are a few solutions. First, however, I'm not sure about the process in the first place. Is this a one time copy operation? A recurring copy operation? Do you want to migrate from MyISAM to InnoDB?

What is the main reason for your desire for a quick operation?

If you're looking for migration, then why don't you use an online table alter tool, such as oak-online-alter-table (disclaimer: I'm author of this tool) or pt-online-schema-change? Both will allow you to change your schema live and online with very little disturbance.

If you're looking to a copy+paste of your data, then I would suggest using chunking: copying the data in small packets. This way you don't get that huge lock and no funny timeouts. You can use either oak-chunk-update or pt-archiver for this. This may actually make the total runtime shorter because of reduces locking, but may also take longer. Also consider that it is not an atomic operation, and changes to original table while copying is made, may not get caught, so you may get an inconsistent copy.

Otherwise (or in addition) you can use all the usual tweaks, such as

SET GLOBAL innodb_flush_log_at_trx_commit := 2;

or set

[mysqld]
innodb_doublewrite = 0

or perhaps, depending on OS and disks,

[mysqld]
innodb_flush_method = O_DIRECT

Each of the above may reduce disk I/O access. First two will also make your server less crash safe. But if for limited time, this may be OK for you.