Postgresql – Can somebody explain BEGIN; and COMMIT; for me once and for all

postgresqltransaction

Let's say I have this PHP code:

dbcall('BEGIN');
dbcall('CREATE SCHEMA "cool schema"');
dbcall('CREATE TABLE "cool schema"."cool table"');
dbcall('COMMIT');

Am I correct in thinking that this will first create a schema, and if the table cannot be created (for any reason, such as a temporary glitch or whatever), the schema that was just created is deleted again, so that the entire "state" of the database is reset to exactly how it was hen the BEGIN query was executed?

I've read the manual on transactions numerous times over the years, and asked people questions like this many times, but I've never got a clear, unambiguous answer, so I've never dared to actually use BEGIN and COMMIT queries anywhere in my application code.

This makes me feel as if my software isn't as "solid" as it could be, and I'm worried that temporary glitches could cause all kinds of issues because I oftentimes execute two queries which are very related, and where one doesn't make sense without the other, and those seem like perfect candidates for the BEGIN/COMMIT syntax.

And what if there is never a COMMIT query being made? Will it just continue piling on query after query and never "commit" them? And then throw them away when the script/session finishes? Will it COMMIT automatically if you make another BEGIN before a COMMIT?

If it has to "roll back" created records and other changes, what happens if other scripts have at the same time used the data in the database while it was working, which now no longer exists?

This truly boggles my poor mind. I cannot comprehend how this can be possible at all, from a logical perspective.

Best Answer

You are already using transactions, because every statement in PostgreSQL runs in a transaction. So no need to be afraid of transactions.

By default, PostgreSQL is running in autocommit mode, so every statement has its own transaction. To get a transaction that spans more than one statement, you explicitly start it with START TRANSACTION (or BEGIN).

Until a transaction is committed, its effects won't be visible outside the transaction itself. If you terminate the transaction with ROLLBACK, all its effects remain invisible. If you terminate the transaction with COMMIT, its effects suddenly become visible.

This works for normal data modifying transactions as well as for DDL statements. After all, DDL statements are mostly data modifications of the metadata tables (of course, there is more to it, like files being created or removed, which is deferred until the transaction commits).

If that seems too magical to you, dive into the depths of how it works under the hood. The documentation has little about that, but you can read the source, perhaps starting with src/backend/access/transam/README and src/include/access/htup_details.h. Blogs may be helpful too.

Start two sessions and play with it. Soon you will get the hang of this wonderful feature that makes a coder's life so much easier when applied sensibly.

Related Solutions

SQL Server Blocking – Troubleshooting Sleeping SPID Blocking Other Transactions

From comments, I'm guessing you had a client side Command timeout that has aborted the SQL query. This does not rollback the transaction because the connection stays open on SQL Server due to connection pooling.

So, you need to use SET XACT_ABORT ON or add some client rollback code

See SQL Server Transaction Timeout for all the gory details

Sql-server – Error on DDL statement never invokes CATCH

Why isn't it caught? Because TRY / CATCH doesn't catch all types of errors. But you can get it to catch the non-system-critical stuff by wrapping it in an EXEC(), such as:

BEGIN TRY
   EXEC('ALTER TABLE...');
END TRY

The error in the ALTER will fail out to the EXEC, which in turn will return to the TRY / CATCH block reporting a simple, catchable error.

For more detail, the MSDN page for TRY...CATCH states:

The following types of errors are not handled by a CATCH block when they occur at the same level of execution as the TRY…CATCH construct:

Compile errors, such as syntax errors, that prevent a batch from running.

Errors that occur during statement-level recompilation, such as object name resolution errors that occur after compilation because of deferred name resolution.

That second bullet point, about object name resolution errors, is what is causing the ALTER statement to fail and not be eligible to get handled by the CATCH block.

EDIT:
Regarding your statement about "DDL statements are implicit COMMIT": I am not sure what you mean by this exactly, but all (well, nearly all) queries are transactions by themselves. Meaning, single-query batches have no use for explicit BEGIN TRAN / COMMIT / ROLLBACK. If the statement fails it automatically rolls back, and if it succeeds then it automatically commits. You would only use the BEGIN TRAN / COMMIT / ROLLBACK construct if you want to conditionally rollback for some reason. It is still a good idea to use the TRY...CATCH structure to properly handle errors, but you gain nothing (at least with the code posted in the question) with the explicit transaction handling. There is absolutely no difference between the posted code and the following:

BEGIN TRY
   EXEC('ALTER TABLE...');
END TRY
BEGIN CATCH
   SELECT 'I''m in Catch';
END CATCH;

Best Answer

Related Solutions

SQL Server Blocking – Troubleshooting Sleeping SPID Blocking Other Transactions

Sql-server – Error on DDL statement never invokes CATCH

Related Question