Introduction to transactions and savepoints
In your scenario "Creating a new user" the basic process could be the following:
BEGIN;
INSERT INTO `users` (`username`, `password`, ...) VALUES ("me", "XXX", ...);
SET @userid := LAST_INSERT_ID();
INSERT INTO `profile_pictures` (`user_id`, `profile_picture_path`)
VALUES (@userid, '/somewhere/in/the/filesystem.jpg');
COMMIT;
If for whatever reason the process should end in the middle, it would be enough to issue a ROLLBACK;
query somewhere before and instead of the COMMIT
. All changes would be undone. This is one of the most basic tasks, which can be accomplished by transactions.
The mentioned savepoints by Rolando might come in handy for example if one inserts the row into profile_pictures
first, then tries to copy the image to the desired location, and if this should fail one only wants to undo the second insert, but not the first. (If copying the picture fails, there is no reason to cancel the whole user registration.) For this one can define a savepoint after inserting into users
, but before inserting into profile_pictures
:
BEGIN;
INSERT INTO `users` (`username`, `password`, ...) VALUES ("me", "XXX", ...);
SET @userid := LAST_INSERT_ID();
SAVEPOINT before_profile_picture;
INSERT INTO `profile_pictures` (`user_id`, `profile_picture_path`)
VALUES (@userid, '/somewhere/in/the/filesystem.jpg');
COMMIT;
In this process one could issue ROLLBACK TO before_profile_picture;
to undo the last insert, but leave the inserted user. If one would issue a COMMIT
after this rollback to a specific savepoint, the row in users
would be saved whereas nothing would be left in profile_pictures
.
...and why they are not used widely...
Transactions only make sense in combination with the storage engine InnoDB, as MyISAM simply does not support transactions. The usage of InnoDB and its locking strategy (row-level locking instead of table locking like MyISAM does) in combination with transactions shows some clear advantages, but also some unclear disadvantages.
The question is, why they are not so common, so we concentrate on disadvantages. The most popular disadvantage is deadlocks.
Explanation of Deadlocks
A deadlock occurs when for example two clients try the following queries at the same time:
Client 1
BEGIN;
SELECT * FROM table1 FOR UPDATE; /* Query 1.1 */
SELECT * FROM table2 FOR UPDATE; /* Query 1.2 */
Client 2
BEGIN;
SELECT * FROM table2 FOR UPDATE; /* Query 2.1 */
SELECT * FROM table1 FOR UPDATE; /* Query 2.2 */
I numbered the four queries for clear reference in the following explanation.
Since the database needs to perform queries in any linear order, it could happen that the order is:
Query 1.1
Query 2.1
Query 1.2
Query 2.2
which would be bad. Query 1.1 locks table1
completely. Query 2.1 then locks table2
completely. Now query 1.2 wants to lock table2
, but table2
is already locked by Client 2. This query will block and wait for the other client to finish its operation. So query 2.2 is executed, but this one wants to acquire a lock for table1
which is locked by client 1 and so would wait for it to finish. So both clients wait for each other to finish. This will never happen of course. InnoDB detects such situations and automatically rolls back one of the transactions.
The application(!) receives an error and in theory has the responsibility to restart the transaction. Although this runs under "error" a deadlock is a normal situation in transactional databases and restarting a transaction should be more or less normal workflow for applications: "Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again."
How this affects behavior and application development
We found out in the last section that with the introduction of InnoDB and transactions deadlocks can occur, which the application must be able to cope with.
If I say the application must be able to cope with it, I basically mean the developers of the application must have the capability (and motivation...) to cope with it. This is one of the biggest problems transactions have! A lot of average developers see deadlocks as a flaw of the DBMS, which should not occur and they have no clue how to deal with them.
This only changed in the last like 2 to 5 years or so.
Even if the developers of open source projects understood the reasons for the usage of InnoDB and transactions, some deadlocks are kind of tedious and require a lot of effort to solve. One must have a thorough understanding of database transactions, isolation levels, locking and the application itself, to lower the occurrences of deadlocks. Take a look at How to Cope with Deadlocks in the MySQL manual to get a glimpse of what may be necessary to do in those cases. Often this is considered too much effort, when the advantages are not clear and the skills ought to be learned first.
This also results in customers filling support tickets containing problems like "long waiting times" (since transactions need to wait for other transactions to free the locks), "strange errors" (deadlocks) and similar problems. Customers do not and do not want to understand database theory, deadlocks or whatever. They are interested in a fully functional software.
If one has the choice between using transactions, coping with deadlocks, learning skills, analyze the application thoroughly and answering support tickets in any way a non-technical person can understand what is happening, and on the other side just using MyISAM tables which may run fine in most circumstances, I do understand why it took so long and still takes time for most projects to adopt InnoDB with transactions.
I do see the main reason for the slow adoption in social aspects as mere technical aspects. Hope this post could clarify some aspects concerning the usage of transactions and why their usage is not that broad.
Best Answer
If you are not in a transaction (i.e. if
autocommit
is 1 and you didn'tBEGIN
a new transaction), then every query you run implicitly starts its own transaction, and the transaction is committed or rolled back (depending on circumstances, such as when certain errors occur) immediately when the query finishes executing.In other words, all DML is done in a transaction... it's only a matter of whether it's implicit or explicit. In InnoDB, everything (except DDL of course) is a transaction.
So, whether you control transactions yourself or not doesn't impact the range of appropriate settings, here. Like almost all other variables, you should leave
innodb_log_file_size
at the default value unless you have a specific and valid reason to change it and understand the rationale for the change... and note that the suggestions output by tuning scripts do not constitute a valid reason to change anything. As a rule, stay away from those.innodb_flush_log_at_trx_commit
controls strict ACID compliance, and should be set to 1 if the database is critical (typical setting for master), 2 if it's important (typical setting for replicas) or 0 if you don't care all that much. It's a little more complex than that, but safest to less safe, the order is 1, 2, 0. Fastest to slowest is generally the opposite, 0, 2, 1. 2 is a relaxed version of 1 but the difference between 1 and 2 is typically more significant than the difference between 2 and 0. The impact of this variable is also fundamentally the same whether your transactions are explicit or implicit because, again, everything in InnoDB is a transaction.