Mysql – How to speed up thesql query against a 204 million record table

MySQLperformanceperformance-tuningquery-performance

For some reason my server has been finicky lately and for the life of me I can't figure out why. I monitor the memory usage by running

shell# watch -n 5 free -m

without running any activities memory used is anywhere from 4-5 GB. Once I run a query it starts increasing all the way up to 35 GB. For example, see output below

Every 5.0s: free -m                                     Fri Mar 13 10:17:28 2015

             total       used       free     shared    buffers     cached
Mem:         36196      34429       1767          0        124       3172
-/+ buffers/cache:      31132       5064
Swap:        24563          0      24563

the query I am running is

SELECT DISTINCT(c1),c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12
FROM db1.tb1
WHERE NOT EXISTS (SELECT * FROM db2.tb2 WHERE db2.tb2.c1 = db1.tb1.c1)
and c6='X' and c9 <='y' and c10>= 'Z' and c12>='N'
GROUP BY c1
LIMIT 5000

All columns on both tables are indexed. Normally this query only takes me a couple of minutes. But last couple days, it has been taking 4+ hours

The db1.t1 has 360 columns and 204 million records. So it is quite large
It uses the InnoDB engine

Ubuntu v. 12.04

MySQL 5.5

I have 36 GB RAM installed

4 core Intel Xeon processor 2.93 GHz with hyper threading enabled.

Below is my my.cnf file (im only going to post the un-commented portions of the file)

[client]
port        = 3306
socket      = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket      = /var/run/mysqld/mysqld.sock
nice        = 0

[mysqld]
user        = mysql    
pid-file    = /var/run/mysqld/mysqld.pid
socket      = /var/run/mysqld/mysqld.sock
port        = 3306
basedir     = /usr    
datadir     = /var/lib/mysql
tmpdir      = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address        = 50.253.189.57
key_buffer      = 5G
max_allowed_packet  = 16M
thread_stack        = 192K
thread_cache_size       = 50
myisam-recover         = BACKUP
max_connections        = 500
table_cache            = 3200
thread_concurrency     = 10
bulk_insert_buffer_size = 1G
join_buffer_size = 500M
sort_buffer_size = 5G
query_cache_limit   = 0
query_cache_size        = 0
open-files-limit = 65535
tmp_table_size = 3G
max_heap_table_size = 1G
table_definition_cache = 1024
table_open_cache = 2048
thread_cache_size = 50
max_connect_errors = 1000000
log_error = /var/log/mysql/error.log
expire_logs_days    = 10
max_binlog_size         = 100M
innodb_buffer_pool_size = 26G
innodb_additional_mem_pool_size = 1G
innodb_buffer_pool_instances = 2
innodb_log_buffer_size = 1G
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2

[mysqldump]
quick
quote-names
max_allowed_packet  = 16M

[isamchk]
key_buffer      = 16

Best Answer

All columns on both tables are indexed.

Enough said. That is not the right way to go about indexing.

WHERE db2.tb2.c1 = db1.tb1.c1

db2.tb2 needs INDEX(c1) -- Keep in mind that a PRIMARY KEY is a UNIQUE key is an INDEX, so do not redundantly add INDEX if you already have PRIMARY KEY(c1).

and c6='X' and c9 <='y' and c10>= 'Z' and c12>='N'

Needs a compound index; recommending adding these three, since there are no clues of which would be best:

INDEX(c6, c9)
INDEX(c6, c10)
INDEX(c6, c12)

At the same time, remove INDEX(c6) because it is now redundant. For further discussion, Cookbook on making an INDEX, given a SELECT.

tmp_table_size = 3G
max_heap_table_size = 1G

Very dangerous. Are you swapping? Swapping is much worse on performance than lowering some of the settings. Each SELECT that needs a tmp table will allocate 1GB; if you have lots of such queries running at the same time, you will rapidly chew up RAM. Lower both of those to their defaults.

key_buffer = 5G

That is for MyISAM indexes. If you are not using MyISAM, set it to 20M. InnoDB is the preferred engine; 26G (out of 36GB) is reasonable if all of your tables are InnoDB.

There may be more suggestions; see how far you get with these.

Edit

DISTINCT(c1),c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12

May not mean what you think -- DISTINCT is not a function, and the parens are irrelevant. That de-dups the entire set of 12 expressions. Also, it is probably irrelevant, unless you actually have duplicate rows.

Is c1 your PRIMARY KEY? Hence UNIQUE? So the GROUP BY is also irrelevant. Removing it may help speed.