Mysql – How to improve insert in RDS of AWS

amazon-rdsawsbulk-insertinsertMySQL

I have a MySQL db.t2.small in AWS, the server isn't in my country, but I get 70ms. Now I need to insert over 7000 records in a table, but it takes a lot, over 6min. I tried many configurations:

-query_cache_size=16777216
-query_cache_type=1
-net_write_timeout=300
-max_allowed_packet=536870912
-innodb_lock_wait_timeout=3600
-innodb_flush_log_at_trx_commit=2 (Also I tried setting it to 0).

I tried sending the requests 1 by 1 and also using bulk, but the result is the same, almost 20 insert per sec.

What am I missing?

My Java bulk code:

try {
        conn = bd.getConectionAWS();
        PreparedStatement stmt = null;
        if (conn.getAutoCommit()) {
            conn.setAutoCommit(false);
        }
        try {
            String query = "INSERT INTO CONTRATO(Codigo,Nombre,IdEstado,DesEstado,FechaInicio,IndActivo,Activo,Descripcion,Cliente,Responsable) VALUES(?,?,?,?,?,?,?,?,?,?) \n"
                    + " ON DUPLICATE KEY \n"
                    + " UPDATE Nombre=?, IdEstado=?, idEstado=? ,DesEstado=? ,FechaInicio=?, IndActivo=?,Descripcion=?,Cliente=?,Responsable=?";
            stmt = conn.prepareStatement(query);
            for (Contrato contrato : listaDatos) {
                stmt.setString(1, contrato.getCodigo());
                stmt.setString(2, contrato.getNombre());
                stmt.setInt(3, contrato.getIdEstado());
                stmt.setString(4, contrato.getDesEstado());
                stmt.setTimestamp(5, contrato.getFechaInicio());
                stmt.setString(6, contrato.getIndActivo());
                stmt.setString(7, contrato.getActivo());
                stmt.setString(8, contrato.getDescripcion());
                stmt.setString(9, contrato.getCliente());
                stmt.setString(10, contrato.getResponsable());   
                stmt.setString(11, contrato.getNombre());
                stmt.setInt(12, contrato.getIdEstado());
                stmt.setString(13, contrato.getDesEstado());
                stmt.setTimestamp(14, contrato.getFechaInicio());
                stmt.setString(15, contrato.getIndActivo());
                stmt.setString(16, contrato.getActivo());
                stmt.setString(17, contrato.getDescripcion());
                stmt.setString(18, contrato.getCliente());
                stmt.setString(19, contrato.getResponsable());
                //          stmt.executeUpdate();
                stmt.addBatch();
            }
            stmt.executeBatch();
            conn.commit();
        }

I took a few things from https://www.mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/

My "1 by 1" is the same, just replaced addBatch for executeUpdate();.

A SELECT 1; query takes 0.059sec.

Best Answer

There are two ways to efficiently (read: quickly) load 7000 rows.

LOAD DATA INFILE -- After you have built a 7000-line CSV file.
"Batch" INSERT -- like INSERT INTO t (a,b) VALUES (1,2),(5,6),(9,2), ...; -- Be cautious about the number of rows. 100 to 1000 is a good range of what to do at a time.

max_allowed_packet=536870912 -- NO, not in a tiny 2GB VM; change to 16M. Other likely settings to check:

key_buffer_size = 10M
innodb_buffer_pool_size = 200M

I assume your tables are InnoDB??

UPDATE 2013-04-15 18:04 EDT

I just noticed you have innodb_file_per_table OFF. What gives ?

You currently have all the InnoDB data and the corresponding index sitting in a single file.
Any CREATE TABLE statement must make data dictionary updates and look for space (small but annoying in this instance)
Internal Fragmentation of ibdata1
Dropping a table means scanning the table and its indexes for availability to lock. With data and index pages possibly fragmented, this takes spindles, seek time, and latency.
See Pictorial Representation of ibdata1 to see everything that goes into ibdata1

Recommendation : Remove all Data and Index Pages from ibdata1

This will give ibdata1 a breather to handle just data dictionary and MVCC management. In addition, ibdata1 will stay rather lean and mean and can be read more quickly.

You will need to perform the InnoDB Infrastructure Cleanup. I wrote out all the steps back on October 29, 2010 in StackOverflow.

UPDATE 2013-04-22 08:10 EDT

Three suggestions

SUGGESTION 1 : I just noticed something else. You are using an ancient version of MySQL (5.0.45). You should think about upgrading to MySQL 5.6.11 as it performs significantly faster that MySQL 5.5 and way faster than MySQL 5.0.

SUGGESTION 2 : You should also go ahead and implement the InnoDB Infrastructure Cleanup.

SUGGESTION 3 : You should also check the disk itself. If the data is sitting on a RAID10 set, one of the disks may have an issues. Check the disk controller's battery as well because it can slow down disk caching and affect read performance.

Mysql – INSERT take too much time

You should increase your bulk_insert_buffer_size to 512M because it accommodates bulk loading of MyISAM tables. According to the MySQL Documentation on bulk_insert_buffer_size:

MyISAM uses a special tree-like cache to make bulk inserts faster for INSERT ... SELECT, INSERT ... VALUES (...), (...), ..., and LOAD DATA INFILE when adding data to nonempty tables. This variable limits the size of the cache tree in bytes per thread. Setting it to 0 disables this optimization. The default value is 8MB.

In MySQL 5.6, the max value for bulk_insert_buffer_size is 4G.

If your MyISAM table has TEXT/BLOB data, I would also increase the max_allowed_packet. What is a MySQL Packet used for ? See my SuperUser Post What does the MySQL “max_allowed_packet” setting actually control?

UPDATE 2014-02-15 19:49 EST

Your last comment was

The table has only fixed size data (char[60], int, ...) One line takes 126b. About the bulk_insert_buffer_size, since I'm using INSERT ... ON DUPLICATE ..., would it help ? Also, why did you choose 512M, and not a slower value, similar to the max_allowed_packet ? (or maybe 128M)

My choice of 512M was arbitrary. You can set it to whatever you are comfortable with. Just don't leave it at the default value of 8M.

Since you gave the row size, let's to the math.

10,000 rows X 126 bytes/row = 1,260,000 = 1.2 MB

OK, bulk insert buffer may not be an issue.

OBSERVATION

I don't think MySQL likes INSERT IGNORE combined with ON DUPLICATE KEY update. Why ?

INSERT IGNORE says INSERT but reject the incoming row if the PRIMARY KEY already exists.
ON DUPLICATE KEY says INSERT but perform some UPDATE on specific columns if the PRIMARY KEY already exists.
Logically, this does not make sense to use both of them. Which one do you want?
- IGNORE duplicates
- UPDATE columns on duplicates

The INSERT should either be

INSERT IGNORE

INSERT ... ON DUPLICATE KEY

Since you have a meaningful ON DUPLICATE KEY, ditch the word IGNORE.

You could also change INSERT ... ON DUPLICATE KEY into REPLACE INTO if you are replacing all non PRIMARY KEY columns.