Mysql – for Insert multiple rows in one single insert MySQL

bulk-insertcsvMySQLpython

I have done bulk-insertion in MySQL using multiple row insertions method by using python script. My testing csv file consists of 100,000 lines only (real one consists of millions lines).
so I perform bulk-insertion.

and by using unix time command the results looks like

for 10 lines            100 lines               1000 lines 

user 13.675 seconds     user 11.948 seconds     user 9.908 seconds  
sys  0.192 seconds      sys  0.076 seconds      sys  0.080 seconds

for 10,000 lines        100,000 lines    

user 11.672 seconds     user 12.024 seconds 
sys  0.072 seconds      sys  0.079 seconds

to make it more simple I add user + sys and results are

10 rows           13.867 seconds
100 rows          12.024 seconds
1000 rows         9.988 seconds
10000 rows        11.744 seconds
100,000 rows      12.103 seconds

I am trying to figure out what is main reason that multiple rows insertion is better around 500-1000 lines and it becomes counter productive below 500 lines and above 1000 lines. I search over internet and found different answer.
some mention that it depends on

max_allowed_packet, bulk_insert_buffer_size, key_buffer_size .

I try these parameters but unable to see any effects.

My question is why bulk-insertion is optimum between 500-1000 rows per insert and its counter productive other than this range and what are the main factors.
I am already using this settings

 max_allowed_packet=16M

there are some parameters also.

Table     Non_unique Key_name  Seq_in_index  Column_name  
roy_table    0       PRIMARY   1                id

Collation Cardinality  Sub_part  Packed  Null  Index_type
 A       100650        NULL     NULL           BTREE

Can some body please point out a benchmark about bulk insertion efficieny or any idea how can I address why this is counter productive above certain point. I had to give solid reason in my report. I will really thankful for any small hint or idea. thanx

Best Answer

  • The elapsed time, not the CPU time is more interesting.
  • You are measuring the client time, not the server time.
  • There are many factors on the server that get 'bad' for large chunks: buffers, undo log, replication delays, etc.
  • The parse time is the main overhead (on the server) for short chunks.
  • You have not specified the row size; this has some impact on the client, network, and server.
  • What connection between client and server? Socket? LAN? WAN? What is the latency between them? On a WAN, the network delays would overwhelm perhaps all other considerations.
  • `key_buffer_size applies only to MyISAM; I hope you are not using that Engine.
  • Were there any indexes on the table? If so, what? If, for example, the PRIMARY KEY were a UUID, there could have a huge impact. Also, UNIQUE keys need to check for duplicates. (Again, this is a server task.)

My experiments show that chunks of 100-1000 rows is close to optimal in many cases. (This happens to somewhat agree with your findings, but that is coincidental, since I looked at different timings.)