Mysql – How does autocommit=off affects bulk inserts performance in thesql using innodb

MySQLtransaction

I know that turning off autocommit can improve bulk insert performance a lot according to:

And I have made experiments to confirm that conclusion. But what I want to know is, why turning off autocommit can improve bulk inserts? And what happens internally in inno db when a transaction is commited?

Best Answer

If autocommit is turned off, you can always do a ROLLBACK to get back to the state the database was in before you started doing something. COMMIT makes sure all changes to the database are written permanently, doing this for every INSERT takes more time than doing this for a lot of INSERTs, because not only the records needs to be inserted, also the indexes needs to be updated. Doing this last piece for a lot of records is more efficient than doing this for 1 record at a time.

Related Solutions

Mysql – How to optimize performance on a MySQL ndbcluster

First of all, Cluster is a storage engine. It doesn't actually execute queries because it doesn't speak SQL. That is why you use a MySQL server, which parses your queries and sends low-level storage engine API calls to the Cluster data nodes. The data nodes know how to retrieve or store data. Or you can talk to the data nodes directly using the NDB API(s).

MySQL Cluster has various means of executing queries. They boil down to:

Primary key lookup
Unique key lookup
Ordered index scan (i.e., non-unique indexes that use T-trees)
Full table scan

Let's say you have 4 data nodes in your cluster (NoOfReplicas=2). This means you have 2 node groups and each one has half the data. Cluster uses a hash on the primary key (unless you've controlled the partitioning using the 5.1 partitioning features). So for any table, half the rows are in one node group and half the rows are the in other node group.

Now for the 4 types of query execution. You can verify which type of execution is used with EXPLAIN. Here's how each ones works:

Primary key lookup - the MySQL server can calculate the hash on the primary key and know which data node the relevant row resides in. Then the MySQL server contacts that data node and receives the row.
Unique key lookup - the MySQL server cannot know which data node the row might be stored in. So it contacts a pseudo-random node. That data node has a hidden table that maps the unique key values to the primary key. Then the hash on the primary key reveals where the row resides.
Ordered index scans are sent in parallel to all data nodes, where they search their local t-tree.
Full table scans are send in parallel to all data nodes, where they scan their primary fragment.

Summary: primary key lookups are best. If you have more than 2 nodes, throughput goes up because all nodes are actively serving different fragments of data. Ordered index lookup and full table scans are done in parallel, so more nodes leads to better performance.

MySQL 5.5.8 InnoDB Foreign Key, JOIN performance

Foreign key relationships are to enforce data integrity, not for query performance, that is what indexes are for. Also note that InnoDB creates an index on each column with a foreign key relationship.

However I would recommend having the foreign key relationships to ensure that the data is always valid, especially when updating and deleting, which may become significant when you have to start archiving data.

Best Answer

Related Solutions

Mysql – How to optimize performance on a MySQL ndbcluster

MySQL 5.5.8 InnoDB Foreign Key, JOIN performance

Related Question