MySQL – Optimal Database Structure for Fast Inserts

database-designMySQLperconaperformanceperformance-tuning

I am using MySQL Percona server.
This table needs to do fast bulk inserts using 'INSERT INTO … VALUES' for smaller requests (about 100k records) and using 'LOAD DATA INFILE' for larger (up to 10kk records).
Also, there will be selects with full table scan (using indexes there will be like 10-30 million of records out of total 100kk records), they also need to be done as fast as possible.
I tried myisam and got around 50k records inserted per second, with InnoDB that was only about 25k. This numbers are OK, but i would be glad, if i could improve them.

How would you organize such table and what setting to mysql server you would apply? We've got a server with 128G of RAM, i think it might be helpfull.

I did some research on this matter, but still feel myself uncompetent. If you have managed similar tables – please, share your expirience, thank you.

Best Answer

The problem with MySQL's InnoDB engine is that the storage architecture is essentially using index-organized tables, which will unfortunately experience performance issues for large numbers of inserts when the target table is large. The only case in which this can be mitigated is if you have sequential inserts, that is, the primary key of the set of inserts follows a sequence.

For high ingestion rates of data, different technologies are recommended, such as Cassandra which uses a log-structured merge tree, or in your case, I might recommend TokuDB, which has been acquired by Percona, and which uses a technology called a fractal tree index, which is essentially creative way to cache, pre-sort, and batch insert into your index-organized table (of course, there's much more to it than that, but no need to go into the details here.)

Long story short, if you have moderately high ingestion rates, you need a heap organized table structure implemented by databases like Oracle and PostgreSQL, and if you have very high ingestion rates, you need to use a database like Cassandra or TokuDB.

Related Solutions

Mysql – Tablestructure for fast inserts/deletes with foreign keys

First, if your tables are InnoDB, and column c.P_ID is the reference of column p.P_ID (note actual column names may be different), then you absolutely should use foreign keys to avoid orphans, but know that you will have to explicitly state ON DELETE CASCADE, since the default is ON DELETE SET NULL, retaining orphans.

But this doesn't really have anything to do with the 'single query' to select the products and information about the competitors.

To achieve:

I want to achieve, that with only one query i got the product information from p an also the competitor ones from c where the product id is the same.

your query would use a join:

SELECT products.P_ID, products.product_name, competitors.info
 FROM products
 LEFT JOIN competitors ON competitors.P_ID=products.P_ID

This will get you repeat information for the products (competitor info will be different). And then you will use your application to parse that info into the array you need.

If you want a single row for the products, you can use the GROUP_CONCAT function for the competitors:

SELECT products.P_ID, products.product_name, 
  GROUP_CONCAT(competitors.info SEPARATOR ',') as competitor_info
 FROM products
 LEFT JOIN competitors ON competitors.P_ID=products.P_ID
 GROUP BY products.P_ID;

A major caveat you should be aware of: ON UPDATE|DELETE CASCADE does not fire triggers on the rows that were deleted/updated by the cascade:

Note

Currently, cascaded foreign key actions do not activate triggers.

This shouldn't stop you from using CASCADE if you want to remove orphan rows. But you should be aware of it.

MySQL – Optimal Configs for ALTER of Large MyISAM Table to InnoDB

RECOMMENDATION #1

Let's presume you have a fair amount of disk space on your cloud server. You create the InnoDB table in stages to watch the effect of diskspace and RAM in slow motion.

Let's say your MyISAM table is called mydb.mytable and you want to convert it to InnoDB.

Try the following:

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new ENGINE=InnoDB;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;

Play with the table.

If you like its performance, then

DROP TABLE mydb.mytable_old;

If you do not like its performance, then

TRUNCATE TABLE mydb.mytable_old;
INSERT INTO mydb.mytable_old SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable RENAME mydb.mytable_zap;
ALTER TABLE mydb.mytable_old RENAME mydb.mytable;
DROP TABLE mydb.mytable_zap;

RECOMMENDATION #2

You mentioned you are running MySQL 5.1. Unless you install the InnoDB Plugin, InnoDB is single threaded. To get InnoDB to be multithreaded:

Install MySQL 5.6
Install MySQL 5.1's InnoDB Plugin if you want to keep MySQL 5.1 (Applicable to MySQL 5.1.38+)

Once you do either one, please configure InnoDB to engage multiple cores. See my past posts:

May 26, 2011 : About single threaded versus multithreaded databases performance
Sep 12, 2011 : Possible to make MySQL use more than one core?
Sep 20, 2011 : Multi cores and MySQL Performance