MySQL – How to Understand DELETE Operation Internally

MySQL

I have two major doubts:

Does a DELETE query first identify all rows to be deleted (from the where clause) and then delete them all, or does it delete rows as and when they match the where clause?
What level of lock does a DELETE query take? Is a full table lock? If not then how is the case of insertion/update while the query is running handled?

For instance I run a DELETE where column X has value 0. At the time the DELETE query examined a particular row the column X had value 1, then later (but before the query finished) some other user updated that row's X column to 0. So what happens when the delete finishes? Will that row be present or absent?

Best Answer

DELETE deletes rows as it sees them while using record locks. The delete actually happens at commit time. A log of the changes is made place so other transactions that started before the delete can still see the values.

If an UPDATE happens a small period later, then either a deadlock will happen, or it will lock wait and return no rows found.

So main thing to note is programming:

always handle errors like deadlocks, there's no obvious way for the database to resolve to consider your business case which is why it exits.
use transactions when you need all or none approaches to data changes.

STATEMENT-BASED REPLICATION

If you run DELETE FROM tblname WHERE blahblahblah; and the rows matching blahblahblah do not exist, no big deal. Replication will just carry on. It will just take you a lot longer to realize your data draft on the Slave (or in your case, either Master) if there were slight differences at all.

ROW-BASED REPLICATION

If you run DELETE FROM tblname WHERE blahblahblah; and the rows matching blahblahblah do not exist, that can break replication because the exact row info is embedded in the relay logs on slave. That row is expected to exist for the delete to happen. There is an error code for it:

sh-4.1# perror 1032
MySQL error code 1032 (ER_KEY_NOT_FOUND): Can't find record in '%-.192s'

You will likely see error log messages like these:

2014-02-27 22:03:00 4070 [ERROR] Slave SQL: Could not execute Update_rows event on table mydb.mytable; Can't find record in 'mytable', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000166, end_log_pos 93114177, Error_code: 1032

2014-03-17 10:50:15 11596 [ERROR] Slave SQL: Could not execute Delete_rows event on table mydb.mytable; Can't find record in 'mytable', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000337, end_log_pos 427881, Error_code: 1032

Here are other links that discuss this

For more insights, please read Advantages and Disadvantages of Statement-Based and Row-Based Replication from the MySQL Documentation.

Sql-server – How to interpret correlated subqueries

This specific case (the exists predicate) is a bit of an outlier as far as correlated subqueries go. ish...

Some correlated subqueries are certainly evaluated once for every row in the outer query, but I don't think you can apply that across the board, and I specifically think that's a statement that cannot be applied with authority to the exists predicate.

The SQL example below this paragraph is highly contrived, but nevertheless illustrates an "evaluated once for each row in the outer query" correlated subquery. It's "highly contrived" because a straight relational join would be a much better solution, and in this simple case the join is super easy to write.

select
    team_name
    , (
       select league_name
       from leagues
       where leagues.league_id = teams.league_id
    ) as league_name
from teams

In this instance, each row in the result set is going to contain the name of the league that the team belongs to, because of the correlated subquery. And the correlated subquery is probably going to be run once for each row in the outer query. Maybe.

The thing is, the database's query optimizer might rewrite this as a more efficient join, depending on the database. So even this simple example might not be executed "once per row in the outer query."

Either way, the subquery is dependent on the values returned from the table in the outer query, and that makes it a correlated subquery--because of the correlation with the outer table.

Part of the point of SQL in general is that the specific implementation is not important to you.

You should think in terms of sets of data (think set arithmetic), not in terms of iterations and comparisons per iteration.

The exists predicate doesn't make you any promises about the particular implementation. In general, exists is supposed to take advantage of table statistics and indexes in such a way that it produces a more efficient query plan than many other approaches will (and more efficient than you're likely to come up with on your own).

The actual query that ends up being run is going to be some kind of join, or multiple joins. But you don't need to get worked up about that. That's why your exists clause contains select * instead of identifying specific fields.

The query plan produced by the database engine will figure out which fields it wants to use, which key or keys and which index or indexes matter, etc.

So your exists example is still a correlated subquery, despite the underlying implementation, because the inner query refers to and is logically dependent upon the table in the outer query.

SELECT DISTINCT store_type FROM stores
WHERE EXISTS
    (
    SELECT * FROM cities_stores
    WHERE cities_stores.store_type = stores.store_type
    );

Best Answer

Related Solutions

MySQL – Safe to Delete Rows on Both Masters in Master-Master Replication?

STATEMENT-BASED REPLICATION

ROW-BASED REPLICATION

Sql-server – How to interpret correlated subqueries

Part of the point of SQL in general is that the specific implementation is not important to you.

Related Question