MySQL 5.5 – How to Use 128 GB RAM to Improve Index Creation

configurationmemoryMySQL

This is similar to another post of mine, but now I have a different table structure, and it's still unclear what MySQL parameters should be changed to take advantage of the extra ram my machine has—so if it seems like a duplicate, let me know how best to rephrase what I'm asking.

I have the following table: create table mytable (id1 int, id2 int, score float) engine MyISAM, with 50 billion records.

I plan to add an index on the first column (alter table mytable add index myindex (id1)), and I am wondering what MySQL parameters could be changed to take advantage of this extra memory (e.g. buffers?)?

Note: the engine type does not need to be MyISAM if that makes a difference.

Best Answer

Your first concern is the data loading. Since you don't intend to have a primary key, but just an index, you should order your data before loading by the index column id1, in any UNIX flavor:

sort -n mytable.txt > mytable.sort

For loading the data, use either mysqlimport of LOAD DATA from the MySQL prompt. To speed up the loading, MYISAM tables benefit from increasing *bulk_insert_buffer_size*, *myisam_sort_buffer_size* or *key_buffer_size*, for INNODB tables, increase *innodb_buffer_pool_size* and *innodb_log_file_size*.

These increases might not be sufficient, and the loading still might slow down as the memory fills up. In that case, it is more efficient to load the data in chunks. You can monitor the speed by executing an hourly du in the mysql root directory:

#!/bin/bash
while [ 1 ]
do
    du -hs database/.
    sleep 3600
done

Just for completeness (not relevant for this specific case): Disabling indexes also helps speedup the loading process.

Now to querying: This is a bit more complex, as it depends on your data, and the way you want to query your table. Biggest factors that influence performance are:

  • buffers: set by variables. Execute your typical queries and compare the "show variables" and "show status" output to identify bottlenecks.

  • indexes: you already covered that

  • joins: Depending on the way you join with other tables, it might be
    better to denormalize your table(s) in a data warehouse style,
    especially if you want to join two large tables.

Great articles on data loading and large table query optimization can be found at: http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/ http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/

Review the default and allowed values for these and other variables at: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html