MySQL – Optimal Database Structure for Fast Inserts

database-designMySQLperconaperformanceperformance-tuning

I am using MySQL Percona server.
This table needs to do fast bulk inserts using 'INSERT INTO … VALUES' for smaller requests (about 100k records) and using 'LOAD DATA INFILE' for larger (up to 10kk records).
Also, there will be selects with full table scan (using indexes there will be like 10-30 million of records out of total 100kk records), they also need to be done as fast as possible.
I tried myisam and got around 50k records inserted per second, with InnoDB that was only about 25k. This numbers are OK, but i would be glad, if i could improve them.

How would you organize such table and what setting to mysql server you would apply? We've got a server with 128G of RAM, i think it might be helpfull.

I did some research on this matter, but still feel myself uncompetent. If you have managed similar tables – please, share your expirience, thank you.

Best Answer

The problem with MySQL's InnoDB engine is that the storage architecture is essentially using index-organized tables, which will unfortunately experience performance issues for large numbers of inserts when the target table is large. The only case in which this can be mitigated is if you have sequential inserts, that is, the primary key of the set of inserts follows a sequence.

For high ingestion rates of data, different technologies are recommended, such as Cassandra which uses a log-structured merge tree, or in your case, I might recommend TokuDB, which has been acquired by Percona, and which uses a technology called a fractal tree index, which is essentially creative way to cache, pre-sort, and batch insert into your index-organized table (of course, there's much more to it than that, but no need to go into the details here.)

Long story short, if you have moderately high ingestion rates, you need a heap organized table structure implemented by databases like Oracle and PostgreSQL, and if you have very high ingestion rates, you need to use a database like Cassandra or TokuDB.