MySQL – Blank fields in tab-delimited .txt file causing errors in LOAD DATA INFILE

MySQLmysql-8.0

I am working with MySQL 8.0 through MySQL Workbench on a Windows machine.

I am having an enormous amount of problems importing an excel file into a MySQL database. I think the problem is that my data contains blank spaces or periods . to denote the absence of values for some positions.

I have received the data in the form of an Excel spread-sheet. As I have read that using CSV format may sometimes cause problems, I have saved the spreadsheet as a tab-delimited text file. I have created a table in MySQL Workbench according to the fields of the spreadsheet and everything is ready for the import. My problem is upon typing the following code into the MySQL Workbench:

LOAD DATA INFILE 'The/path/to/my/file.txt'
INTO TABLE my_table
FIELDS TERMINATED BY '\t'
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS

I receive errors about incorrect values.

Some of the positions of the data are not of the type that the database expects. For example, there is a column that expects a DATE. The issue is that in the data, some fields corresponding to the column are blank. I think what is happening is that when MySQL tries to import that blank value, the error is caused.

Other columns have either empty spaces (no values) or indeed a single period . to denote that there is no value for that position. Depending on the column, the database may expect type Date or Int. I have searched extensively for advice on the subject. The first answer on this page had looked hopeful, but after implementing a copy of the solution I received an error of the type:

Error Code: 1366. Incorrect integer value: ''for column 'abs_2012' at row 1.

abs_2012 is of type Int and row one's value is an empty space. What is particularly strange is that abs_2012 is the last column of about 30 I haven't received any errors about the previous columns.

Does anyone have any idea at all?

Many thanks in advance

Best Answer

Use SET form of LOAD DATA INFILE (load the whole line into a user-defined variable and then parse it into separate fields).

Example:

LOAD DATA INFILE 'The/path/to/my/file.txt'
INTO TABLE my_table
IGNORE 1 ROWS
( @temp )
SET @count = LENGTH(@temp) - LENGTH(REPLACE(@temp, '\t', '')),
    id = CASE WHEN @count > 0 
              THEN SUBSTRING_INDEX(@temp, '\t', 1) 
              ELSE NULL END,
    field1 = CASE WHEN @count > 1 
                  THEN SUBSTRING_INDEX(SUBSTRING_INDEX(@temp, '\t', 2), '\t', -1)  
                  ELSE DEFAULT() END,
    field2 = CASE WHEN @count > 2  
                  THEN SUBSTRING_INDEX(SUBSTRING_INDEX(@temp, '\t', 3), '\t', -1)  
                  ELSE CURRENT_TIMESTAMP END,
    /* and similar expressions for all another fields */

The problem is that a record is inserted always - ever if a line is empty. So you need to remove those "empty" records after importing, and set some correct literal values to NOT-NULL fields which do not have default value.

Depending on the column, it may be of type Date or Int.

You can easily create unique parsing statement for each separate field dependent by its data type. Like

fieldX = CASE WHEN @count > X  
              THEN some_parsing_function_or_expression(SUBSTRING_INDEX(SUBSTRING_INDEX(@temp, '\t', X+1), '\t', -1))  
              ELSE {default value proper for field's data type} END,

or the same using intermediate variable

@tmpX = CASE WHEN @count > X  
             THEN SUBSTRING_INDEX(SUBSTRING_INDEX(@temp, '\t', X+1), '\t', -1))  
             ELSE 'default value proper for field data type' END,
fieldX = CASE WHEN @tmpX = 'default value proper for field data type'
              THEN @tmpX
              ELSE some_parsing_function_or_expression(@tmpX) END,