How pt-online-schema-change Ensures Consistency During Data Copy

alter-tableMySQLperconapercona-toolkitpercona-tools

I have read pt-online-schema-change documentation and understood that it works by creating triggers and copying data in chunks with in chunk-time seconds(0.5 seconds in default)

Let us consider this below case.

We have a table TBL1 of 5 GB and is being altered with PT-ONLINE-SCHEMA-CHANGE

PT-ONLINE-SCHEMA-CHANGE created new table with the desired change, added trigggers on
TBL1 and started copying.

copied 10 chunks of data and started copying 11th chunk which is some 'X' MB in size.
As per tool, this chunk be copied in a max of 500 milliseconds or 0.5 seconds

During the 11th chunk copying, application issued an UPDATE which has to modify effect
rows that are part of this 11th chunk of data.

My question is that which of the below things happen now.original table now ?

A) Update will be blocked till the chunk be copied and then applied on original table 
   which gets updated through triggers on new one.

B) Update will be performed on original table and data chunk will be re copied all 
   over again.

If Option A is the answer, how PT-ONLINE-SCHEMA-CHANGE can be lock free ?

If Option B is the answer, how it knows the chunk being copied is modified ?

Thanks in advance.

Best Answer

pt-online-schema-change takes advantage of the consistency nature of relational databases at table level. Let's see what happens when we write to the original table, one of three cases:

  1. The chunk has already been copied. No problem, the trigger will overwrite the values of the accessory table within the transaction

  2. The chunk has not yet been copied. The trigger will probably fail, but the original modification will be done thanks to the IGNORE option on the INSERT/UPDATE/DELETE of the trigger. No problem, when the pt-osc process reaches to it, it will copy the newer values.

  3. Your question. The chunk is being copied. There are 2 subcases, depending on which process starts first.

    a) If the modification of the original table starts first, either the whole table (in the case of MyISAM) or the set of rows (in the case of InnoDB), will be locked for other modifications, so pt-osc will have to wait for the write to finish. You should avoid very time-consuming write operations while this tool is working, or relying on it to pause if the load is too high, but in the unfortunate case that you are doing in concurrency another large update, the session variable innodb_lock_wait_timout (configured by the parameter lock-wait-timeout, by default 1 second) for pt-osc should kick in and revert/abort the copy of that chunk. In that case, the tool will retry that chunk again --retries times (default, 3), or the whole process will abort, in theory gracefully. In most cases, once the lock disappears (because the concurrent write finishes), you will be essentially on case 2.

    b) If pt-osc starts first, it will essentially lock again the whole table (in MyISAM) or those specific rows (in InnoDB) for writing (a READ LOCK), preventing concurrent writes. This will only be hold for, ideally, up to 0.5 seconds, as the chunk size has been calculated so that the chunk copy should not take more than that. Once it has finished, you are essentially on case 1.

So, yes, there is not 100% lock free tool, but no such a tool can exist if you want to guarantee consistency. Even the rename table process takes microseconds (after all, a/several file(s) must be renamed on disk), and for those microseconds, the table will be locked. The idea is to have a minimum amount of locking. For MyISAM, every time you write a row (except for concurrent inserts) the whole table is locked, and for InnoDB, at least that particular row (sometimes more). You cannot change that unless you sacrifice the atomicy and durability of changes. However, you can take into acount some tricks to minimize its problems:

  • Use InnoDB instead of MyISAM for extra write concurrency
  • Avoid slow/large writing transactions on that table during the copy process
  • Lower the value of chunk-time parameter. In an extreme case, you could set chunk-size to one, copying and locking one row at a time, but that will be extremely slow. The chunk-time will be an heuristic worse-case scenario of the maximum locking time when you happen to write the exact rows that you are copying at that time.
  • MySQL 5.6 allows for concurrent writes while certain DDLs are running. It is less flexible, but it should work faster as it has access to the underlying logic. In some cases, this may work better, but it will suffer from similar problems when doing too many concurrent writes (as those will be buffered).

Reading the code of the tool is not that hard, I invite you to read it for more information.