Mysql – Prepare MySQL for big UPDATE query on production server

MySQLrollbackupdate

Surprisingly I have not found anything relevant on this subject.

What are best practices to prepare production MySQL server for a big update query? The question is broad but I'd like to know details for a typical setup:

MySQL 5.6
80/20 read/write workflow ratio
Only InnoDB
Replication Master
Query Cache enabled
Data fits entirely into memory buffer pool with some spare room left
Healthy HDD drives on hardware RAID-1
Millions of rows in the update query
The query only covers single table (1 primary key – INT)

And the questions:

What to look for?
What mysql configuration parameters will be most relevant for the process to go quickly and smoothly?
Is it worth splitting the update into multiple batches?

Best Answer

The Query cache is a hindrance. (In general, it should be OFF for production systems.) All entries in the QC for that table will be purged by the Update.

Millions of rows -- all of the table? A small fraction of a billion row table, but no good index? What?

Let's dissect the query...

Scan through the entire table. This takes time for the following reasons:

Read the blocks
Write the "undo" records to disk (in case of ROLLBACK or crash)
Write the blocks back to disk
Toss the undo blocks

The size of "undo" may necessitate a less efficient method, adding to the overall time.

The affected rows will be locked; well you may as well plan on the 'entire' table being blocked from other usage.

Batching...

Avoids blocking as long
Probably takes longer overall
Complexity in the code
Not atomic (if that matters)

Why do you need to do such a big Update? Perhaps the column you are changing should be in a separate, parallel, table to avoid this much impact? Or 'computed' by subsequent SELECTs?

Config changes? Assuming the innodb_buffer_pool_is big enough, nothing useful to do.

For chunking, see advice on chunking -- that talks about deleting, but can be adapted to updating.

Related Solutions

Mysql: update query with subquery

The question is a unclear on how the tables are to actually related to each other.

Since the query you've posted is not working, it's difficult to tell which element of the logic contains the error, and with a column called "oldID" it's hard to understand the necessary join logic.

I have attempted to interpret and rewrite your query based on trying to understand it from the innermost query to the outermost query, it seems like this might be what you are trying to accomplish.

My table "ta" is doing the same thing as your table "ta" seems to be trying to do, and my table "a" is doing the same thing your table "a" seems to be trying to do. They are reversed in their order of appearance in the query because, as I mentioned, if I'm understanding your query correctly, the original query seems to be written inside-out.

UPDATE Taxonomy t
  JOIN TaxonomyMapping ta ON ta.oldID = t.oldID
  JOIN CategoryMapping cm ON cm.language_ID = ta.oldID
  JOIN TaxonomyMapping a ON a.oldID = cm.en_id
   SET t.sourceId = a.id
 WHERE t.id > -1;

However, if your "TaxonomyMapping" and "Taxonomy" tables are an "exact copy" (containing the same data) then table "ta" is not necessary, and all you actually need is this:

UPDATE Taxonomy t
  JOIN CategoryMapping cm ON cm.language_ID = t.oldID
  JOIN TaxonomyMapping a ON a.oldID = cm.en_id
   SET t.sourceId = a.id
 WHERE t.id > -1;

In fact, it seems like you could even do it without a permanent copy.

UPDATE Taxonomy t
  JOIN CategoryMapping cm ON cm.language_ID = t.oldID
  JOIN (SELECT * FROM Taxonomy) x ON x.oldID = cm.en_ID
   SET t.sourceId = x.id
 WHERE t.id > -1;

That last example works because the server materializes a temporary copy of "Taxonomy" called table "x" before executing the rest of the query. By doing that, it's able to update the table with data selected from elsewhere in the same table, which you normally can't do... but since a copy is technically a different table, it works. The temporary copy is usually created in memory and then moved to disk if it turns out to be large, but either way, it's automatically deleted as soon as the query is finished. The down-side is that the new table won't have any indexes so this approach won't infinitely scale... but should still be faster than your original correlated subquery.

Of course, none of these will work if I have failed to understand the necessary logic.

Thesql update query almost 5 minutes slower in 5.6 vs 5.5

OPTIMIZE TABLE is essentially never needed for InnoDB.

New with 5.6: ALTER TABLE uses ALGORITHM=INPLACE for many actions (add/drop column/index, etc.). I would expect 5.6 to be faster.

What is that list of commands trying to do? Something to do with Normalization? Seems like it could be done in a couple of queries, without needing the extra table.

innodb_buffer_pool_size should be set to about 70% of available RAM. That will improve performance on any version.

Don't index flags (deleted), it will slow down INSERTs, without speeding up SELECTs.

Don't use 4-byte integers (INT) for flags; use the 1-byte TINYINT.

No currency in the world needs 10 decimal places: price decimal(23,10). And I doubt if room for 10 trillion 'dollars' is needed.

What's the value of autocommit during the UPDATEs?

Best Answer

Related Solutions

Mysql: update query with subquery

Thesql update query almost 5 minutes slower in 5.6 vs 5.5

Related Question