Mysql – Are MySQL transactions under-utilised

innodbMySQLtransaction

I have a vague understanding of transactions that it means you can help the server understand that a series of queries are related.

Scenario : Suppose I have a series of queries which creates a user – inserts a user info record, creates an empty profile record and initialises a few other records etc..

If for whatever reason the script were to fail half way through this, what do I need to do to ensure that everything to do with the "creating a user queries" gets undone?

From what I understand reading the MySQL documentation, it as simple as using START TRANSACTION and COMMIT statements.

If that is so, why do so many CMS's neglect to do so? I suppose a common argument against it might be that it is unnecessary for short scripts dealing with only a few record updates, but am I not right in thinking that it is also highly beneficial when you are dealing with replication environments too?

Best Answer

Introduction to transactions and savepoints

In your scenario "Creating a new user" the basic process could be the following:

BEGIN;
INSERT INTO `users` (`username`, `password`, ...) VALUES ("me", "XXX", ...);
SET @userid := LAST_INSERT_ID();
INSERT INTO `profile_pictures` (`user_id`, `profile_picture_path`)
  VALUES (@userid, '/somewhere/in/the/filesystem.jpg');
COMMIT;

If for whatever reason the process should end in the middle, it would be enough to issue a ROLLBACK; query somewhere before and instead of the COMMIT. All changes would be undone. This is one of the most basic tasks, which can be accomplished by transactions.

The mentioned savepoints by Rolando might come in handy for example if one inserts the row into profile_pictures first, then tries to copy the image to the desired location, and if this should fail one only wants to undo the second insert, but not the first. (If copying the picture fails, there is no reason to cancel the whole user registration.) For this one can define a savepoint after inserting into users, but before inserting into profile_pictures:

BEGIN;
INSERT INTO `users` (`username`, `password`, ...) VALUES ("me", "XXX", ...);
SET @userid := LAST_INSERT_ID();
SAVEPOINT before_profile_picture;
INSERT INTO `profile_pictures` (`user_id`, `profile_picture_path`)
  VALUES (@userid, '/somewhere/in/the/filesystem.jpg');
COMMIT;

In this process one could issue ROLLBACK TO before_profile_picture; to undo the last insert, but leave the inserted user. If one would issue a COMMIT after this rollback to a specific savepoint, the row in users would be saved whereas nothing would be left in profile_pictures.

...and why they are not used widely...

Transactions only make sense in combination with the storage engine InnoDB, as MyISAM simply does not support transactions. The usage of InnoDB and its locking strategy (row-level locking instead of table locking like MyISAM does) in combination with transactions shows some clear advantages, but also some unclear disadvantages.

The question is, why they are not so common, so we concentrate on disadvantages. The most popular disadvantage is deadlocks.

Explanation of Deadlocks

A deadlock occurs when for example two clients try the following queries at the same time:

Client 1

BEGIN;
SELECT * FROM table1 FOR UPDATE; /* Query 1.1 */
SELECT * FROM table2 FOR UPDATE; /* Query 1.2 */

Client 2

BEGIN;
SELECT * FROM table2 FOR UPDATE; /* Query 2.1 */
SELECT * FROM table1 FOR UPDATE; /* Query 2.2 */

I numbered the four queries for clear reference in the following explanation.

Since the database needs to perform queries in any linear order, it could happen that the order is:

Query 1.1
Query 2.1
Query 1.2
Query 2.2

which would be bad. Query 1.1 locks table1 completely. Query 2.1 then locks table2 completely. Now query 1.2 wants to lock table2, but table2 is already locked by Client 2. This query will block and wait for the other client to finish its operation. So query 2.2 is executed, but this one wants to acquire a lock for table1 which is locked by client 1 and so would wait for it to finish. So both clients wait for each other to finish. This will never happen of course. InnoDB detects such situations and automatically rolls back one of the transactions.

The application(!) receives an error and in theory has the responsibility to restart the transaction. Although this runs under "error" a deadlock is a normal situation in transactional databases and restarting a transaction should be more or less normal workflow for applications: "Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again."

How this affects behavior and application development

We found out in the last section that with the introduction of InnoDB and transactions deadlocks can occur, which the application must be able to cope with.

If I say the application must be able to cope with it, I basically mean the developers of the application must have the capability (and motivation...) to cope with it. This is one of the biggest problems transactions have! A lot of average developers see deadlocks as a flaw of the DBMS, which should not occur and they have no clue how to deal with them.

This only changed in the last like 2 to 5 years or so.

Even if the developers of open source projects understood the reasons for the usage of InnoDB and transactions, some deadlocks are kind of tedious and require a lot of effort to solve. One must have a thorough understanding of database transactions, isolation levels, locking and the application itself, to lower the occurrences of deadlocks. Take a look at How to Cope with Deadlocks in the MySQL manual to get a glimpse of what may be necessary to do in those cases. Often this is considered too much effort, when the advantages are not clear and the skills ought to be learned first.

This also results in customers filling support tickets containing problems like "long waiting times" (since transactions need to wait for other transactions to free the locks), "strange errors" (deadlocks) and similar problems. Customers do not and do not want to understand database theory, deadlocks or whatever. They are interested in a fully functional software.

If one has the choice between using transactions, coping with deadlocks, learning skills, analyze the application thoroughly and answering support tickets in any way a non-technical person can understand what is happening, and on the other side just using MyISAM tables which may run fine in most circumstances, I do understand why it took so long and still takes time for most projects to adopt InnoDB with transactions.

I do see the main reason for the slow adoption in social aspects as mere technical aspects. Hope this post could clarify some aspects concerning the usage of transactions and why their usage is not that broad.