MySQL – Updating/Inserting 2000 Entries Takes Over 12 Hours

amazon-rdsMySQLmysql-5.6

I have three tables of around 10 million rows each in MySQL on Amazon RDS and I am finding the insert/update performance to be VERY slow. Each table contains unrelated data.

Each table is regularly updated with about 2000 rows each time. The rows either update existing values or insert new ones. I am finding that updating or inserting 2000 rows takes over 12 hours to perform.

The database has 100GB of General SSD storage which means that it can sustain 300 IOPS. CPU utilisation is below 20% and RAM is 50% free.

I am using an index and a primary key which was constructed from two columns within the table using: CONSTRAINT PK_1 PRIMARY KEY (DATE, NAME)

I am using the following query:

    INSERT INTO Table (DATE, NAME, COLUMN_1, ..., COLUMN_10)
    VALUES 
    ('2015-05-26','David', VALUE_1, ... , VALUE_10),
    ...
    ('2015-05-26','Tom', VALUE_1, ... , VALUE_10)
    ON DUPLICATE KEY UPDATE COLUMN_1=VALUES(COLUMN_1), ... , COLUMN_10=VALUES(COLUMN_10);

The above query inserts/updates 7 rows and 10 columns of data. This happens for all 2000 entries(users).

The reason why I have chosen to use the above statement instead of a REPLACE INTO statement is because whilst the table is 40 columns wide, I only receive 10 of those columns at once. The values of the rows can also change for up to a week.

By default, the data was being inserted into the three tables sequentially. I have tried inserting the data in parallel. However, there was no significant improvement in the performance.

I don't understand why it takes over 12 hours to insert 2000 rows. What are my options to improving performance?

Best Answer

I'm going to give this my best guess, but I will preface saying that we don't have intimate knowledge of your schema, and while I am knowledgeable about MySQL, I would not call myself an expert.

Index-organized tables

One problem you might be having is due to the way in which MySQL stores data, along with your UPSERT behavior. Using the InnoDB engine in MySQL, every table is an index-organized table. This means that the data payload is stored directly within the pages of the index. In effect, the primary key index is also the table structure.

Now, in your case, you are inserting rows with a total of 42 columns. Without knowing the table structure, I am guessing the some of these 40 unknown columns are of variable length. So each time you update the data, there is likely to be quite a lot of data movement due to this. InnoDB has to rewrite the index and data pages in order to accommodate the new data.

Change your schema?

So, based on this likely limitation you are seeing, I think one thing you should consider is a schema change. What you have now is essentially one big flat table. But, you've stated that you receive only 10 of these 40 unknown columns at a time, so why don't you have at least 5 tables? That is (in loose pseudo code, and make sure to add your FK relationships):

CREATE TABLE record (RecID, DATE, NAME);
CREATE TABLE column_set_1 (RecID,COL1,...,COL10);
CREATE TABLE column_set_2 (RecID,COL11,...,COL20);
CREATE TABLE column_set_3 (RecID,COL21,...,COL30);
CREATE TABLE column_set_4 (RecID,COL31,...,COL40);

This way, you are only performing inserts, and only into the relevant tables without touching all this other data.

Table Partitioning

Also, because you have a DATE field, it may be helpful to partition your table. I can't say for sure if this will have a strong positive impact, but I suspect it may. If you go for my suggested schema change, for example, I might choose to hash partition the 4 column_set_* tables based on RecID and then the INSERT...ON DUPLICATE KEY UPDATE can make use of partition lock pruning.

Best of luck, and maybe someone with better MySQL expertise can correct any mistakes I've made.

Related Solutions

Mysql – Help the database isn’t performing fast enough! 100M Merge with 6M need < 1 hour!

If your table has a table with more than 1000 columns, it cannot be converted to InnoDB. In that case, run this query

SELECT CEILING(SUM(index_length)/POWER(1024,2)) num
FROM information_schema.tables WHERE engine='MyISAM';

This will give you the correct size for key_buffer_size in MB.

Since you are doing an UPSERT, you should set concurrent_insert to 2 to make INSERTs go faster. You may want to consider changing the table's row format to Fixed. I wrote about why to do both in StackOverflow. In essence, if you make the table's row format Fixed, all table rows are the same size. Thus, INSERTs and UPDATEs would operate on the exact same length of data. Management of row access is far more reasonable.

Since MyISAM only caches indexes (in the key buffer), all data must be read from disk. anything you can do to getting better RAID performance (as asked by @TomTom) would help your cause as well.

Mysql – INSERT take too much time

You should increase your bulk_insert_buffer_size to 512M because it accommodates bulk loading of MyISAM tables. According to the MySQL Documentation on bulk_insert_buffer_size:

MyISAM uses a special tree-like cache to make bulk inserts faster for INSERT ... SELECT, INSERT ... VALUES (...), (...), ..., and LOAD DATA INFILE when adding data to nonempty tables. This variable limits the size of the cache tree in bytes per thread. Setting it to 0 disables this optimization. The default value is 8MB.

In MySQL 5.6, the max value for bulk_insert_buffer_size is 4G.

If your MyISAM table has TEXT/BLOB data, I would also increase the max_allowed_packet. What is a MySQL Packet used for ? See my SuperUser Post What does the MySQL “max_allowed_packet” setting actually control?

UPDATE 2014-02-15 19:49 EST

Your last comment was

The table has only fixed size data (char[60], int, ...) One line takes 126b. About the bulk_insert_buffer_size, since I'm using INSERT ... ON DUPLICATE ..., would it help ? Also, why did you choose 512M, and not a slower value, similar to the max_allowed_packet ? (or maybe 128M)

My choice of 512M was arbitrary. You can set it to whatever you are comfortable with. Just don't leave it at the default value of 8M.

Since you gave the row size, let's to the math.

10,000 rows X 126 bytes/row = 1,260,000 = 1.2 MB

OK, bulk insert buffer may not be an issue.

OBSERVATION

I don't think MySQL likes INSERT IGNORE combined with ON DUPLICATE KEY update. Why ?

INSERT IGNORE says INSERT but reject the incoming row if the PRIMARY KEY already exists.
ON DUPLICATE KEY says INSERT but perform some UPDATE on specific columns if the PRIMARY KEY already exists.
Logically, this does not make sense to use both of them. Which one do you want?
- IGNORE duplicates
- UPDATE columns on duplicates

The INSERT should either be

INSERT IGNORE

INSERT ... ON DUPLICATE KEY

Since you have a meaningful ON DUPLICATE KEY, ditch the word IGNORE.

You could also change INSERT ... ON DUPLICATE KEY into REPLACE INTO if you are replacing all non PRIMARY KEY columns.