Mysql – How to update a (very) large table without locking in MySQL

lockingmyisamMySQLupdate

I have a large table (58+ million records) which represents a relation between two records (player and target) in a second table.

Unfortunately, whoever designed our schema didn't think things through properly, and opted to use usernames to represent this relation, instead of the numerical ids for the user records. As things progressed (like they usually do), our usernames are no longer a valid, unique representation of a player so I need to convert these relations to use numerical ids.

Adding the fields without locking was easy thanks to Percona Toolkit, which offers pt-online-schema-change that can ALTER on a live table. Populating the tables, however might be trickier.

The tables looks like this (I've stripped the creates of non-relevant fields), with the two unpopulated fields being player_id and target_id:

CREATE TABLE `player_targets` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `player` varchar(20) NOT NULL,
  `player_id` int(10) unsigned DEFAULT NULL,
  `target` varchar(20) NOT NULL,
  `target_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=58000000 DEFAULT CHARSET=latin1;

CREATE TABLE 'player_uuids' (
  `id`int(11) NOT NULL AUTO_INCREMENT,
  `username` varchar(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=600000 DEFAUL CHARSET=latin1;

I was planning on populating the two new fields with a query like this one:

UPDATE player_targets t
INNER JOIN player_uuids u1
  ON u1.username = t.player
INNER JOIN player_uuids u2
  ON u2.username = t.target
SET
  t.player_id = u1.id,
  t.target_id = u2.id
WHERE
  t.player_id IS NULL
  OR t.player_id IS NULL;

With the table storing relations being MyISAM, my interpretation of the documentation is that the UPDATE-query will lock the table until it's finished with all the rows. As the table is large, this will likely not work very well in a live environment.

What would the best approach for this be? writing a script to iterate over batches of relations? Changing the table engine to InnoDB (the table is read-heavy, which I believe is the reason for it being MyISAM)?

Best Answer

You can delete or update by one row in loop statement. In this case there will be no blocking but it will work a little slower. Use limit 1 to do this in procedure:

  PROCEDURE myProcedure()
    BEGIN
    DECLARE c int; -- to calculated affected rows
    set c = 0;

      ml:LOOP
          UPDATE player_targets t
          INNER JOIN player_uuids u1
           ON u1.username = t.player
          INNER JOIN player_uuids u2
           ON u2.username = t.target
          SET
           t.player_id = u1.id,
           t.target_id = u2.id
          WHERE
           t.player_id IS NULL
           OR t.player_id IS NULL 
           LIMIT 1;

          -- check if the loop has completed    
          IF  ROW_COUNT() = 0 THEN
              LEAVE ml;
          END IF;    

          set c = c + 1;

          IF c MOD  100  = 0 THEN
              SELECT CONCAT(c, ' row(s) updated');
          END IF;
      END LOOP;  

      SELECT CONCAT(c, ' row(s) updated; The statement has completed');
    END

And call procedure:

Call myProcedure();