Mysql – Ram & Concurrency: MySQL InnoDB LOAD INLINE FILE on 50 30GB CSV files

csvmariadbMySQL

As the title suggests I'm moving around a lot of data. This is in an environment with a 24 core CPU and 2TB of M.2 storage.

My first question is: given I have 16GB of ram, does that mean it's impossible for me to load a 30GB file into SQL (as the entire thing must be stored in ram?)

The second question: This is a fresh table I'm loading into, so I have the luxury of dictating which engine I can use. I've read that InnoDB is better for writing (in which I'm doing alot of). However, MySQL manual says that the CONCURRENCY clause will work on MyISAM. So which engine should I use?

Any other helpful advice is appreciated.

My mysqld conf:

innodb_buffer_pool_size = 12G
innodb_log_file_size    = 4G
innodb_write_io_threads = 18

Best Answer

MyISAM is not really in the running anymore. Go with InnoDB.

No, the file/table never needs to be entirely in RAM at once.

Are you using LOAD DATA INFILE? That is probably the best way to do the loading.

Will the 50 be put into a single table? That could be 50 LOADs in a row, or 50 in parallel, with some contention. Or somewhere in between. Since the task is mostly I/O, loading more than a few at a time will not speed things up.

It is common to load data into a temp file, then massage the data to fix errors, normalize, etc. Then copy the cleansed data over to the 'real' table. This intermediate file could be MyISAM or Aria, but there is probably no strong argument for or against the table choice. The 'real' table should be InnoDB; period.

If you use such a staging table, it should probably not have any indexes. All the operations would [probably] be on the entire table, so indexing would be an unused burden.

If this is a Master, there could be replication issues. (Not covered here.)