Mysql – Optimize speed on 400GB MyISAM table

mariadbmyisamMySQL

We have MariaDB server with single table with image url information. The table is about 400GB on disk and contains probably 400M rows.

Table is partitioned in 1024 partitions.

All queries are similar to this one:

select * from container where id in (1234, 1235 ... );

The sql usually took 2 sec to be executed.

Each row contains a single image url, a title and keywords.

Keybufer is set to 8GB.

This set up works well, until we begin to insert aditioan rows. We tried normal inserts, also low_priority inserts. It is slow in both cases.

I wonder what else we can tweak in order to speed up the selects.

Update:
the table has no indexes, except the primary key on a bigint field e.g. primary key(id)

Update 2:
Here is some more information:

Create Table

CREATE TABLE `container` (
  `id` bigint(20) NOT NULL,
  `data` blob NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=binary
/*!50100 PARTITION BY KEY (id)
PARTITIONS 1024 */

data field stores standard JSON. It is UTF8 text, but because of wrong encoding of the input data, we were forced to store it in blob + binary encoding.

Size on disk

# du -h /usr/local/mysql/var/mydb/
371G    /usr/local/mysql/var/mydb/

Count(*)

> select count(*) from container;
+-----------+
| count(*)  |
+-----------+
| 409036295 |
+-----------+
1 row in set (0.04 sec)

my.ini

[mysqld]
server-id = 1

port=3306
socket=/tmp/mysql.sock
skip_name_resolve

open-files-limit=64000

#Flush every 5 min (300 sec)
set-variable = flush_time=900

#Max Clients
set-variable = max_connections=5050
set-variable = max_user_connections=5000
set-variable = back_log=50

set-variable = table_open_cache=1024
set-variable = table_definition_cache=1024

#INSERT While SELECT-ing. Default is 1 (1 = On if have no hole, 2 = On if have hole)
set-variable = concurrent_insert=2

#Interactive timeout 60 min (from console)
set-variable = interactive_timeout=3600

#non-interactive timeout 3 hours
set-variable = wait_timeout=10800

set-variable = key_buffer_size=8192M
set-variable = max_allowed_packet=5M
set-variable = sort_buffer_size=256M

set-variable = tmp_table_size=512M
set-variable = max_heap_table_size=64M

#all updates will wait for selects
set-variable = low_priority_updates=1

#preforking
set-variable = thread_cache_size=64

#----- SLOW QUERIES -----

set-variable = long_query_time=2
set-variable = log_slow_queries=mysql-slow.log

#----- CASHE -----

# SELECT SQL_CACHE * from x
set-variable = query_cache_type=0
set-variable = query_cache_limit=1M
set-variable = query_cache_size=128M

Best Answer

Assuming you are SELECTing only by the PRIMARY KEY, then the following will speed it up:

  • NO PARTITIONing. It only slows down such queries, especially because of 1024 partitions.
  • Switch to InnoDB. The PRIMARY KEY is "clustered" with the data; this will save a disk hit on each row fetched. Shrink key_buffer_size to 50M and raise innodb_buffer_pool_size to 70% of available RAM. InnoDB also avoids table locks.
  • Compress, in the client, the JSON text; the datatype needs to be BLOB (as it mistakenly is now).

Note: InnoDB will expand the disk space by a factor of 2-3; the compression will get that space back. (Please test this with, say, a million rows. Experiment with the different ROW_FORMATs; I don't know which one will work best for your situation. And validate the SELECT speed and the lack of interference between reads and writes.)

Do you really have 5000 connections at the same time? They are probably stumbling over each other and slowing down each other.

You have half turned off the Query cache; also do query_cache_size=0