If you have a schema already in place, you should use mysqldump options to create the dump of the data to perform in the inserts only
mysqldump -h... -u... -p... --no-create-info --databases ... > MySQLData.sql
You should also make sure to raise the bulk_insert_buffer_size (default is 8M) on all DB Servers to accommodate large extended inserts. This will also help LOAD DATA INFILE if loading non-empty tables.
You are going to have to also adjust your max_allowed_packet (default is 1M).
Try these settings for starters:
[mysqld]
max_allowed_packet=256M
bulk_insert_buffer_size=256M
UPDATE 2011-10-11 06:53 EDT
Regardless of how beefy the hardware is, how much RAM you have allocated, how well tuned the OS is, and how current the version of MySQL is, MySQL will only perform as well as it is configured.
Example: If you have a swimming pool that can hold 10,000 gallons of water, you have a truck holding 10,000 gallons of water, and the hose on the truck is only as big as a straw, you can only push but so much water through the straw to fill the pool. Getting a bigger truck or getting a faster water pump simply will not improve things. You must exchange the hose for a much bigger hose to accommodate more water, thus, more throughput.
In like fashion, MySQL 5.5 out-of-the-box does not come fully tuned.
Example #1: MySQL 5.5 comes with semisynchronous replication. By default, it is disabled. You must do the two step process of starting mysql, runnning INSTALL PLUGIN on master and slave modules, shutdown mysql, add timing and activation options to my.cnf for the semisych features, then starting up mysql for the second time. Only then will semisync replication work.
Example #2: The bulk insert buffer by default is 8M. The bulk insert buffer will not grow because of the presence of any specific hardware or software. It stays 8M until you increase it. It is possible to increase it by either adding it to my.cnf and restarting mysql or running SET bulk_insert_buffer_size = 268435456;
to set it to 256M within a session and then load the mysqldump within that same session.
Example #3: MySQL 5.5 is fully capable of engaging multiple CPUs. By default, the features for engaging multiple CPUs is disabled. They require tuning because although MySQL 5.5 is multicore ready, MySQL 5.5 is only as multithreaded as you configure it.
Conclusion: You must configure MySQL to recognize that it has beefy hardware, more RAM, and a cooperative OS at its disposal.
For starters, I would not touch the buffer sizes just yet. The sizes youhave in the question are monstrously too big.
Here is another observation: You have BLOB data. Ouch, your temp table is going to eat space rather quickly. You could do somehting like this:
Create a 32GB RAM Disk called /var/tmpfs by adding this line to /etc/fstab
none /var/tmpfs tmpfs defaults,size=32g 1 2
Next, create a folder called /mysqltmp and mount the RAM disk on it
mkdir /mysqltmp
chown mysql:mysql /mysqltmp
mount /mysqltmp /var/tmpfs
Add this to my.cnf and restart mysql
[mysqld]
tmpdir=/mysqltmp
Now, any tmp table made via DDL lands in the RAM disk.
Here is yet another observation: Why not create a separate table that keeps the BLOB data away from the unique names?
CREATE TABLE `data_store_name` SELECT id,uniqname FROM `data_store` WHERE 1=2;
ALTER TABLE `data_store_name` ADD PRIMARY KEY (id);
ALTER TABLE `data_store_name` ADD UNIQUE KEY (uniqname);
ALTER TABLE `data_store_name` ADD INDEX name_id_ndx (uniqname,id);
INSERT INTO `data_store_name` SELECT id,uniqname FROM `data_store`;
This will prevent any moving around of BLOB data when indexing.
From here, you would have to always join data_store using its name like this:
SELECT
A.uniqname,B.data
FROM
(SELECT * FROM data_store_name WHERE uniqname = 'mydataname') A
LEFT JOIN
data_store B USING (id)
;
Making these changes will sidestep this whole mess of dealing with keycache, RAM disks, and tmp tables.
Give it a Try !!!
Best Answer
I have tried a different variety of solution with a similar data load -over 1B- but the better that I have found is this:
From MySQL documentation
With some extra work, it is possible to make
LOAD DATA INFILE
run even faster for a MyISAM table when the table has many indexes. Use the following procedure:Execute a
FLUSH TABLES
statement or amysqladmin flush-tables
command.Use
myisamchk --keys-used=0 -rq /path/to/db/tbl_name
to remove all use of indexes for the table.Insert data into the table with
LOAD DATA INFILE
. This does not update any indexes and therefore is very fast.Re-create the indexes with
myisamchk -rq /path/to/db/tbl_name
. This creates the index tree in memory before writing it to disk, which is much faster that updating the index during LOAD DATA INFILE because it avoids lots of disk seeks. The resulting index tree is also perfectly balanced.Execute a
FLUSH TABLES
statement or amysqladmin flush-tables
command.LOAD DATA INFILE
performs the preceding optimization automatically if the MyISAM table into which you insert data is empty. The main difference between automatic optimization and using the procedure explicitly is that you can let myisamchk allocate much more temporary memory for the index creation than you might want the server to allocate for index re-creation when it executes the LOAD DATA INFILE statement.In order to obtain better performance from the myisamchk you have to tune some params like :
Depending on your hardware architecture
Note
When using
LOCAL
withLOAD DATA
, a copy of the file is created in the server's temporary directory. This is not the directory determined by the value of tmpdir or slave_load_tmpdir, but rather the operating system's temporary directory, and is not configurable in the MySQL Server.So, you you have this kind of problem and your file it's a csv, you can split you "huge" file into chunks
Then repeat your
LOAD DATA LOCAL
(step 3) for every chunk file.