Mysql – How to speed up update join query between 2 thesql tables

innodbjoin;MySQLperformancequery-performanceupdate

alright long story short.,
i have powerful dedicated server.

Intel  I7-6700K -
64GB DDR4 2400 MHz
1x480GB   SSD

running mysql server along with nginx,php

innodb-ft-min-token-size = 1
innodb-ft-enable-stopword = 0

innodb_buffer_pool_size = 40G
max_connections = 2000

[deploy@ns540545 ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         45G         11G        107M        6.4G         16G
Swap:          2.0G        1.4G        640M

it was expensive so i got another dedicated server for cost cutting lets call it
not-so powerful dedicated server

Intel  i3-2130  
8GB DDR3 1333 MHz   
2TB 

running mysql server along with nginx,php

innodb-ft-min-token-size = 1
innodb-ft-enable-stopword = 0

innodb_buffer_pool_size = 4G
max_connections = 2000

[root@privateserver deploy]# free -h
              total        used        free      shared  buff/cache   available
Mem:           7.7G        7.5G         73M         24M        150M         79M
Swap:           39G        7.8G         32G

i moved database from powerful server to not-so powerful server.

I can feel slight performance degradation while running simple queries which is fine, but this one query which used to take 2 minutes on powerful server now it takes around 26.6525 hours and counting on not-so powerful server.

UPDATE content a JOIN peers_data b ON a.hash = b.hash SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated

More info about tables which are exactly same on both the dedicated server

CREATE TABLE `peers_data` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `hash` char(40) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `seeders` int(11) NOT NULL DEFAULT '0',
  `leechers` int(11) NOT NULL DEFAULT '0',
  `is_updated` int(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`hash`),
  UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `content` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `content_id` int(11) unsigned NOT NULL DEFAULT '0',
  `hash` char(40) CHARACTER SET ascii NOT NULL DEFAULT '',
  `title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `tags` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `category` smallint(3) unsigned NOT NULL DEFAULT '0',
  `category_name` varchar(50) CHARACTER SET ascii COLLATE ascii_bin DEFAULT '',
  `sub_category` smallint(3) unsigned NOT NULL DEFAULT '0',
  `sub_category_name` varchar(50) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `size` bigint(20) unsigned NOT NULL DEFAULT '0',
  `seeders` int(11) unsigned NOT NULL DEFAULT '0',
  `leechers` int(11) unsigned NOT NULL DEFAULT '0',
  `upload_date` datetime DEFAULT NULL,
  `uploader` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '0',
  `uploader_level` varchar(10) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
  `comments_count` int(11) unsigned NOT NULL DEFAULT '0',
  `is_updated` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`content_id`),
  UNIQUE KEY `unique` (`id`) USING BTREE,
  KEY `hash` (`hash`),
  KEY `uploader` (`uploader`),
  KEY `sub_category` (`sub_category`),
  KEY `category` (`category`),
  KEY `title_index` (`title`),
  KEY `category_sub_category` (`category`,`sub_category`),
  KEY `seeders` (`seeders`),
  KEY `uploader_sub_category` (`uploader`,`sub_category`),
  KEY `upload_date` (`upload_date`),
  KEY `uploader_upload_date` (`uploader`,`upload_date`),
  KEY `leechers` (`leechers`),
  KEY `size` (`size`),
  KEY `uploader_seeders` (`uploader`,`seeders`),
  KEY `uploader_size` (`uploader`,`size`),
  FULLTEXT KEY `title` (`title`),
  FULLTEXT KEY `tags` (`tags`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


mysql> explain UPDATE content a JOIN peers_data b ON a.hash = b.hash SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated ;
+----+-------------+-------+------------+--------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | UPDATE      | a     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL | 4236260 |   100.00 | NULL        |
|  1 | SIMPLE      | b     | NULL       | eq_ref | PRIMARY       | PRIMARY | 160     | func |       1 |   100.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+------+---------+----------+-------------+
2 rows in set (0.00 sec)

records in peers_data 6,367,417
records in content 4,236,268

How can i speed up the above update join query ?
i was expecting about 1 hour on not-so powerful server, but 26 hours+ is too much.

what i am doing wrong ? or missing here ?

i have tried to compensate for RAM on not-so powerful server by setting 32 GB + swap space.
is innodb buffer pool 4 gb too much ?


so far i have tried.

  1. fixing both charset type same
  2. setting innodb buffer pool size to default 128M

and now the task completed in 19.93 Hours.


as per RolandoMySQLDBA suggestion,

i am trying SUGGESTION #2 as those Indexes are required for site to function properly.

i have set innodb_buffer_pool_size = 5G

and trying

SET GLOBAL innodb_change_buffer_max_size = 50; UPDATE content a JOIN
peers_data b ON a.hash = b.hash SET a.seeders = b.seeders, a.leechers
= b.leechers, a.is_updated = b.is_updated; SET GLOBAL innodb_change_buffer_max_size = 25;

It's been 25+ hours and the query is still running.
will update when the query finishes.

one more thing is data is not ultra important, its readily available data and i take daily backups., so later ill try suggestion 3 also. and will use whichever is faster.


update 2 :

suggestion 2 took 38.36667 hours to complete.

now trying suggestion 3.

Best Answer

YOUR QUERY

Your query should be taking a long time. Why ??? Look at your query:

UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;

Please note the columns begin updated by the query in the content table:

  • seeders
  • leechers
  • is_updated

Which of these columns are indexed ?

  • seeders
  • leechers

What indexes do you have in the content table involving those columns ?

KEY `seeders` (`seeders`), <<<<-------------------------------- THIS ONE !!!
KEY `uploader_sub_category` (`uploader`,`sub_category`),
KEY `upload_date` (`upload_date`),
KEY `uploader_upload_date` (`uploader`,`upload_date`),
KEY `leechers` (`leechers`), <<<<------------------------------ THIS ONE !!!
KEY `size` (`size`),
KEY `uploader_seeders` (`uploader`,`seeders`), <<<<------------ THIS ONE !!!

What is going on under the hood ???

If you are changing the values for seeders and leeches, these three(3) indexes are having their leaf nodes reshuffled.

Even if the the majority of the values are not being changed, rows are being locked and copies of your data are being stockpiled in your undo logs (for the sake of MVCC). This results is additional disk I/O (ibdata1 should be growing)

INNODB

The InnoDB Buffer Pool will be going through the "Perfect Storm". Why ???

Please note the InnoDB Architecture (Picture from Percona CTO Vadim Tkachenko)

InnoDB Plumbing

Please note the Insert Buffer. Here is what the MySQL Documentation says about it:

The change buffer is a special data structure that caches changes to secondary index pages when affected pages are not in the buffer pool. The buffered changes, which may result from INSERT, UPDATE, or DELETE operations (DML), are merged later when the pages are loaded into the buffer pool by other read operations.

Unlike clustered indexes, secondary indexes are usually nonunique, and inserts into secondary indexes happen in a relatively random order. Similarly, deletes and updates may affect secondary index pages that are not adjacently located in an index tree. Merging cached changes at a later time, when affected pages are read into the buffer pool by other operations, avoids substantial random access I/O that would be required to read-in secondary index pages from disk.

Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately.

Change buffer merging may take several hours when there are numerous secondary indexes to update and many affected rows. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries. Change buffer merging may also continue to occur after a transaction is committed. In fact, change buffer merging may continue to occur after a server shutdown and restart (see Section 14.21.2, “Forcing InnoDB Recovery” for more information).

In memory, the change buffer occupies part of the InnoDB buffer pool. On disk, the change buffer is part of the system tablespace, so that index changes remain buffered across database restarts.

Here is where the "Perfect Storm" comes in: By default, InnoDB storage engine reserves up to 25% of the Buffer Pool for Change Buffering. All changes to your three(3) secondary indexes has to pile up in the InnoDB Bufffer Pool's Insert Buffer. When the actual index pages land in the Buffer Pool, the merge process will subsequently be pushed to disk (Note the Insert Buffer inside ibdata1), producing more disk I/O.

SUGGESTIONS

SUGGESTION #1

Get rid of those indexes. Why ??? This will eliminate the change buffering needed for managing those indexes during your mass UPDATE.

Run the following query:

SELECT
    COUNT(1) rowcount,
    COUNT(DISTINCT seeders) seeders_count,
    COUNT(DISTINCT leechers) leechers_count
FROM content;

If seeders_count is less than 5% of rowcount, then the cardinality of seeders_count can disqualify the seeders from being of any use. Smae going with leechers.

For that index uploader_seeders, run this query:

SELECT COUNT(1) uploader_seeders_count FROM
(SELECT DISTINCT uploader,seeders FROM content) A;

If uploader_seeders_count is less than 5% of rowcount (from previous query), then get rid of uploader_seeders index.

To get rid of those indexes, run this:

ALTER TABLE content DROP INDEX uploader_seeders,DROP INDEX seeders,DROP INDEX leechers;

SUGGESTION #2

Change the Insert Buffer size to the max value, the run the query:

SET GLOBAL innodb_change_buffer_max_size = 50;
UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
SET GLOBAL innodb_change_buffer_max_size = 25;

Also, set the Buffer Pool Size (innodb_buffer_pool_size) to 20G. Change that value in my.cnf.

If you have MySQL 5.7, simply run

mysql> SET GLOBAL innodb_buffer_pool_size = 1024 * 1024 * 1024 * 20;

If you have MySQL 5.6 and prior, you must restart mysqld.

SUGGESTION #3 (RISKY)

If you do not want to drop the indexes, you could disable change buffering during the mass UPDATE, then enable it afterwards:

Run the following:

SET GLOBAL innodb_change_buffering = 'none';
UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
FLUSH TABLES;
SET GLOBAL innodb_change_buffering = 'all';

This is risky because this speeds up changes to indexes in favor of not having buffering to recover in the event of a crash or reboot.

SUGGESTION #4 (RISKY)

Another cavalier approach would be to disable the Double Write Buffer. Since a restart is required, do this:

  • service mysql restart --skip-innodb_doublewrite
  • Do your mass UPDATE
  • service mysql restart

This is not recommended for Production. Dev and Staging only please !!!