MySQL Workbench – Row Truncated Due to More Data Than Input Columns

MySQLmysql-workbench

I am running the following script to load data from a .csv file on MySQL Workbench:

SET foreign_key_checks = 0;

LOAD DATA LOCAL INFILE "C:\\movieDB\\movie_metadata_edited.csv"
INTO TABLE IMDB.TITLE
FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
IGNORE 1 LINES

(Title_id, Length, @dummy, Primary_title, @dummy, @dummy, @dummy, Language, Country, Content_rating, @dummy, @dummy)
;

and it loads just fine except for it gives the following warning, for n= 65, 239, 275, 280, 689, 817, 898, 924, 954, 1100, 1216, 1502, 1552, 1776, 1785, 1799, 1847, 1890, 1957, 2035, 2108, 2323, 2371, 2437, 2505, and 2606

Row n was truncated; it contained more data than there were input
columns.

the file I'm trying to load is movie_metadata.csv, and my table has columns
Title_id, Title_type, Primary_title, Content_rating, Start_year, Length, Language, Country, Fancenum_in_poster.

I found a solution for when every row comes back with this warning, but no solution was found for when it's only the occasional row.

How can I fix this and get rid of the warning? If I need to fix the file, how to I find each of those rows? The first row in my database doesn't seem to correspond with the first row in the file.

Best Answer

Try opening your file with a text editor, and going to the 66th line (header + 65).

You'll see the movie_title is:

"The Chronicles of Narnia: The Lion, the Witch and the Wardrobe "

This title includes a comma, which (based on your criteria) means that the above is interpreted as movie_title ("The Chronicles of Narnia: The Lion) and num_voted_users (the Witch and the Wardrobe "). this will push each of the following columns over by one, leaving you with an extra column in this row.

And, of course, that's not a particularly useful num_voted_users value....

However, the solution is already there in the file (for this row at least - presumably for the others). This .csv file appears to be formatted so that values that include the column separator are enclosed by double quotes (").

Try running your script with the following modification:

FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'

If MySQL Workbench handles this like most systems I've worked with, it won't assume that every value is enclosed by double quotes; however, if a value starts with double quotes, it will ignore column separators until it gets to the closing double quote.


Tip: I downloaded your file, opened it in a text editor, and did the following to find lines with problems:

  • counted the number of commas in the header line (there were 11);
  • using a regular expression search, marked all lines with exactly 11 commas;
  • Inverted the marking, so only lines with more or less than 11 commas were marked;
  • Looked at the first marked line.

(FYI - I use TextPad as my editor; any good text editor should be able to do something like this, though the details may vary).