MySQL Indexing – How to Optimize MySQL Setup for Faster Index Creation

indexmyisamMySQLunique-constraint

I have a server running Ubuntu 10.04 with Mysql 5.1x installed via package. The system has 128GB ram, 8 cores, and has 4TB's free space where Mysql & Mysql tmp are stored.

I have an MyISAM like this:

CREATE TABLE `data_store` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `uniqname` varchar(150) NOT NULL,
  `data` blob,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM CHARSET=latin1;

I inserted 800mil records (about 350gb data before inserting), then attempted to add the following index:

ALTER TABLE data_store DISABLE KEYS;
ALTER TABLE data_store ADD INDEX uniqname_index (uniqname);
ALTER TABLE data_store ENABLE KEYS;

(Regarding the DISABLE KEYS command, I saw that suggested elsewhere to use BEFORE inserting data, and since the command never got to ENABLE KEYS, I assume it is serving no purpose for me. I've mainly just included to be thorough in my description of what I'm doing.)

When I start the index job, the first status that shows in SHOW PROCESSLIST is "copying to tmp table".

After a few hours, I check bac, and the status remains on the "Repair With Keycache" message even after 24 hours. I tried running the job on a slightly older server, and after 3 days, it still remains with status "Repair with keycache". Because of this, I've canceled the create-index command on this newer machine.

I've read that the "Repair With Keycache" can be very slow, and "Repair by sorting" is preferred in many cases.

Based on some Stack Exchange and random posts online, I added the following settings to my Mysql server:

myisam_sort_buffer_size = 80G
bulk_insert_buffer_size = 80G
myisam_repair_threads   = 8
max_heap_table_size     = 20G
myisam_max_sort_file_size = 500G
tmp_table_size          = 20G
key_buffer_size         = 20G
sort_buffer_size        = 20G
join_buffer_size        = 20G

I restarted the job, and the same process happened again (copy to tmp file, then repair via key cache).

After I killed the job, I noticed in mysql/error.log, there is a message, "myisam_sort_buffer_size is too small". This happened earlier in the day, and not exactly when I killed the job.

questions

  • Am I going about this wrong way? I just want to be able to lookup my data quickly by some key (uniqname).

  • Is there any benefit to starting over w/ my table, adding the index from the beginning, use the DISABLE KEYS command, insert my 800mil records, and then ENABLE KEYS? I read elsewhere that this can prevent the copying tmp table (which may only save me a few hours?)

  • Do I want this 'Repair by sorting'?

Best Answer

For starters, I would not touch the buffer sizes just yet. The sizes youhave in the question are monstrously too big.

Here is another observation: You have BLOB data. Ouch, your temp table is going to eat space rather quickly. You could do somehting like this:

Create a 32GB RAM Disk called /var/tmpfs by adding this line to /etc/fstab

none                    /var/tmpfs              tmpfs   defaults,size=32g        1 2

Next, create a folder called /mysqltmp and mount the RAM disk on it

mkdir /mysqltmp
chown mysql:mysql /mysqltmp
mount /mysqltmp /var/tmpfs

Add this to my.cnf and restart mysql

[mysqld]
tmpdir=/mysqltmp

Now, any tmp table made via DDL lands in the RAM disk.

Here is yet another observation: Why not create a separate table that keeps the BLOB data away from the unique names?

CREATE TABLE `data_store_name` SELECT id,uniqname FROM `data_store` WHERE 1=2;
ALTER TABLE `data_store_name` ADD PRIMARY KEY (id);
ALTER TABLE `data_store_name` ADD UNIQUE KEY (uniqname);
ALTER TABLE `data_store_name` ADD INDEX name_id_ndx (uniqname,id);
INSERT INTO `data_store_name` SELECT id,uniqname FROM `data_store`;

This will prevent any moving around of BLOB data when indexing.

From here, you would have to always join data_store using its name like this:

SELECT
    A.uniqname,B.data
FROM
    (SELECT * FROM data_store_name WHERE uniqname = 'mydataname') A
    LEFT JOIN
    data_store B USING (id)
;

Making these changes will sidestep this whole mess of dealing with keycache, RAM disks, and tmp tables.

Give it a Try !!!