MySQL out of memory on huge query

memoryMySQL

I've identified a huge query which is causing my MySQL server to run out of memory. Unfortunately, I don't have access to the application side, so I'm trying to prevent the crash.

Here is the query, there is no fixed date/time for it to occur:

SELECT SQL_CALC_FOUND_ROWS * 
FROM logs  
WHERE 
  ctime > '1489202077' AND ctime <= '1489212877'   
  AND ( (
    ((msg_type IN ('1', '10', '11')  AND status IN ('0','1','2')) )
    OR  
    ((msg_type IN ('0', '10', '11')  AND status IN ('0','1','2','3')) 
   ) ))  
   AND msg_type NOT IN ('4','5')  
   AND ((host_id IN ('14','19',.... >,'406602','406603')) -- around 400000 values !
  )
  ORDER BY ctime DESC, host_name ASC, log_id DESC, service_description ASC  
  LIMIT 0, 30 ;

When I execute it, MySQL occupies all the RAM little by little (like a memory leak) and is killed by the OOM killer.

In the process list, the query is frozen in the "statistics" state. Even if I kill it, it remains:

| 633363 | *** | *** | *** | Killed |    334 | statistics  | SELECT SQL_CALC_FOUND_ROWS * FROM logs  WHERE ctime > '1489406239' AND ctime <= '1489417039'   AND ( | 0 | 0 |

The situation is exactly the same with an EXPLAIN before the SELECT.

I didn't find any clue on saturated buffers, increasing InnoDB buffer pool size doesn't change anything, I guess the leak occurs at the preparation stage. I've been able to slow down that leak with optimizer_search_depth=0 but I don't know if it is a good idea.

The target table (logs) size is 9GB, partitioned by ranges and got the following indexes :

  PRIMARY KEY (`log_id`,`ctime`),
  KEY `host_name` (`host_name`(64)),
  KEY `service_description` (`service_description`(64)),
  KEY `status` (`status`),
  KEY `instance_name` (`instance_name`),
  KEY `ctime` (`ctime`),
  KEY `rq1` (`host_id`,`service_id`,`msg_type`,`status`,`ctime`),
  KEY `rq2` (`host_id`,`msg_type`,`status`,`ctime`),
  KEY `host_id` (`host_id`,`service_id`,`msg_type`,`ctime`,`status`),
  KEY `host_id_2` (`host_id`,`msg_type`,`ctime`,`status`)

Here are some useful informations on configuration (MySQL 5.6.29-76.2, 64GB of RAM, all tables InnoDB), the other parameters are the defaults ones.

innodb_buffer_pool_instances = 16
innodb_buffer_pool_size = 16G
innodb_log_file_size    = 4G
innodb_log_buffer_size = 32M
innodb_file_per_table   = 1
innodb_file_format      = Barracuda
innodb_data_file_path   = ibdata1:12M:autoextend
tmp_table_size      = 32M
max_heap_table_size = 32M
query_cache_type    = 0
query_cache_size    = 0
max_connections     = 150
thread_cache_size   = 200
thread_pool_size    = 20
open_files_limit    = 65535
max_allowed_packet  = 16M
max_connect_errors  = 100000
sql_mode            = NO_ENGINE_SUBSTITUTION
userstat            = on
optimizer_search_depth = 7
sort_buffer_size      = 4M
read_rnd_buffer_size  = 2M
join_buffer_size      = 2M

I didn't activate the query cache because there is a lot of inserts and updates on this database.

System log files (classical OOM kill), nothing in mysqld.err :

Node 0 DMA free:15752kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB sla
b_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 1918 64538 64538
Node 0 DMA32 free:252496kB min:2008kB low:2508kB high:3012kB active_anon:314728kB inactive_anon:318108kB active_file:4kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1964380kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB sla
b_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:1356kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 62620 62620
Node 0 Normal free:65056kB min:65556kB low:81944kB high:98332kB active_anon:61713812kB inactive_anon:2625048kB active_file:204kB inactive_file:5108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:64122880kB mlocked:0kB dirty:0kB writeback:332kB mapped:512
kB shmem:4kB slab_reclaimable:12356kB slab_unreclaimable:48404kB kernel_stack:6848kB pagetables:148960kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:288 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
(...)
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 2461]   497  2461 19603782 16238066  10       0             0 mysqld
(...)
Out of memory: Kill process 2461 (mysqld) score 987 or sacrifice child
Killed process 2461, UID 497, (mysqld) total-vm:78415128kB, anon-rss:64951604kB, file-rss:532kB
mysqld invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
mysqld cpuset=/ mems_allowed=0

We see active_anon in node 0 Normal reaches 61713812kB while it is only 17319232kb usually.

Lastly, here is a list of actions I tried with no success :

  • ANALYSE, CHECK and OPTIMIZE on Logs
  • innodb_log_buffer_size : 32M (8M before)
  • optimizer_search_depth : 7 (62 before)
  • innodb_buffer_pool_size: 40G (16G before)
  • innodb_use_sys_malloc: 0 + innodb_additional_mem_pool_size: 8G (free some cache buffer but no change about the crash)

I was wondering if innodb_flush_method: O_DIRECT would help but because there is a lot of updates and inserts and because MySQL is crashing very often, I fear to lose data.

Any help or clue would be very appreciated !

Edit : The table logs contains 55773961 records. Here is result for 'SHOW CREATE TABLE Logs' :

CREATE TABLE `logs` (
  `log_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `ctime` int(11) unsigned NOT NULL DEFAULT '0',
  `host_id` int(11) DEFAULT NULL,
  `host_name` varchar(255) DEFAULT NULL,
  `instance_name` varchar(255) NOT NULL,
  `issue_id` int(11) DEFAULT NULL,
  `msg_type` tinyint(4) DEFAULT NULL,
  `notification_cmd` varchar(255) DEFAULT NULL,
  `notification_contact` varchar(255) DEFAULT NULL,
  `output` text,
  `retry` int(11) DEFAULT NULL,
  `service_description` varchar(255) DEFAULT NULL,
  `service_id` int(11) DEFAULT NULL,
  `status` tinyint(4) DEFAULT NULL,
  `type` smallint(6) DEFAULT NULL,
  PRIMARY KEY (`log_id`,`ctime`),
  KEY `host_name` (`host_name`(64)),
  KEY `service_description` (`service_description`(64)),
  KEY `status` (`status`),
  KEY `instance_name` (`instance_name`),
  KEY `ctime` (`ctime`),
  KEY `rq1` (`host_id`,`service_id`,`msg_type`,`status`,`ctime`),
  KEY `rq2` (`host_id`,`msg_type`,`status`,`ctime`),
  KEY `host_id` (`host_id`,`service_id`,`msg_type`,`ctime`,`status`),
  KEY `host_id_2` (`host_id`,`msg_type`,`ctime`,`status`)
) ENGINE=InnoDB AUTO_INCREMENT=262896188 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (ctime)
(PARTITION p201603 VALUES LESS THAN (1459461600) ENGINE = InnoDB,
 PARTITION p201604 VALUES LESS THAN (1462053600) ENGINE = InnoDB,
 PARTITION p201605 VALUES LESS THAN (1464732000) ENGINE = InnoDB,
 PARTITION p201606 VALUES LESS THAN (1467324000) ENGINE = InnoDB,
 PARTITION p201607 VALUES LESS THAN (1470002400) ENGINE = InnoDB,
 PARTITION p201608 VALUES LESS THAN (1472680800) ENGINE = InnoDB,
 PARTITION p201609 VALUES LESS THAN (1475272800) ENGINE = InnoDB,
 PARTITION p201610 VALUES LESS THAN (1477954800) ENGINE = InnoDB,
 PARTITION p201611 VALUES LESS THAN (1480546800) ENGINE = InnoDB,
 PARTITION p201612 VALUES LESS THAN (1483225200) ENGINE = InnoDB,
 PARTITION p201701 VALUES LESS THAN (1485903600) ENGINE = InnoDB,
 PARTITION p201702 VALUES LESS THAN (1488322800) ENGINE = InnoDB,
 PARTITION p201703 VALUES LESS THAN (1490997600) ENGINE = InnoDB,
 PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Best Answer

INDEX(ctime) is the only one that might be useful. Get rid of all the other secondary indexes unless you have other queries that would specifically benefit.

Since it says 'statistics', I suspect it is running through all the possible indexes doing various calculations. With 400K items in host_id, and several indexes using that column, that is why I am saying to get rid of at least those indexes.

Prefix indexes are rarely useful; I advise tossing them anyway.

A single column index on a low-cardinality column (status?) is unlikely to be chosen, so it is simply a waste.

Building a covering index is not practical (at least because of SELECT *).

Building an index to handle the ORDER BY is not possible because of the mixture of ASC and DESC.

OR makes it impossible to do anything with msg_type and status. Switching to UNION would not help enough.