Mysql – Importing from CSV file, is crashing thesql server

csvimportMySQL

I am facing a problem with importing large data into single mysql table.
Basically I assume it is happening because limited machine resources but,
I am having troubles to proof it.

this is my CSV file information:

root@my:~# tail -n 1 /tmp/keys.csv 
ffff7771-e330-9d8e-6783-daf8a75fe4ef,7
root@my:~# wc -l /tmp/keys.csv 
162281544 /tmp/keys.csv

So basically I have 162,281,544 lines of

ffff7771-e330-9d8e-6783-daf8a75fe4ef,7

in my CSV and its file size is 6G

I need to import this table on a daily basis and to use it only for SELECT information by the KEY eg:

SELECT uuid_tags_codes FROM keys WHERE uuid = 'ffff7771-e330-9d8e-6783-daf8a75fe4ef';

I choose MyISAM as my table engine because I read that it should be better from selecting information.

CREATE TABLE `keys` (
  `uuid` varchar(60) NOT NULL,
  `uuid_tags_codes` int(11) NOT NULL,
  PRIMARY KEY (`uuid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

After all this Information and I really hope it is clarifying all my technical stuff, If you think I missed something I will be happy to provide more details as you wish.

The problem is when I am using load data in file command as below:

LOAD DATA LOCAL INFILE '/tmp/keys.csv'
INTO TABLE key
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(uuid,uuid_tags_codes);

After I am executing this command it mysql is running but it is unresponding and I have to delete the table from filesystem and restart it,
Also I think the system have no enough data storage 10G free space in total before importing the file.
But this is OK I always can Improve the server machine, by extending the SSD size.

Here are my questions:

  1. In case I extend my SSD drive should this process be OK, is mysql single table should be able to handle 6-8G data of 2 cols?

  2. Is Mysql is the right solution to my case, or should I use different DB server for the pulling this info?

  3. In case using mysql is ok, is my table definition is ok for the data I am storing in it? and what type of indexing should I use?

Best Answer

Let's do some calculation:

uuid: 36 bytes (Add 1 because it is varchar) ==> 37
uuid_tags_codes: 4 bytes
When indexed, full length of uuid field: 100 bytes

(37+4+100)bytes * 162,281,544 rows = 21.3 G

So, 21G without adding the overhead! Therefore, your first problem is the space.

To answer your questions:

  1. It is more than 6-8G. To ba safe you need ~30G. Single MyISAM table can handle that. I don't recommend it though. MyISAM is not as safe as InnoDB for example.
  2. MySQL is not wrong solution, but I think there are better solution. Since it is kind of look up table, key-value store would do better job. For example, memcache, Redis.
  3. You can improve the structure by making uuid char(36). In fact, this article has even more efficient solution. Also, use the smallest integer type that fits your maximum value of uuid_tags_codes
  4. You may want to load the data first to the table without its indexes, then create the required index.

HTH