I am running the following script to load data from a .csv file on MySQL Workbench:
SET foreign_key_checks = 0;
LOAD DATA LOCAL INFILE "C:\\movieDB\\movie_metadata_edited.csv"
INTO TABLE IMDB.TITLE
FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
IGNORE 1 LINES
(Title_id, Length, @dummy, Primary_title, @dummy, @dummy, @dummy, Language, Country, Content_rating, @dummy, @dummy)
;
and it loads just fine except for it gives the following warning, for n= 65, 239, 275, 280, 689, 817, 898, 924, 954, 1100, 1216, 1502, 1552, 1776, 1785, 1799, 1847, 1890, 1957, 2035, 2108, 2323, 2371, 2437, 2505, and 2606
Row n was truncated; it contained more data than there were input
columns.
the file I'm trying to load is movie_metadata.csv, and my table has columns
Title_id
, Title_type
, Primary_title
, Content_rating
, Start_year
, Length
, Language
, Country
, Fancenum_in_poster
.
I found a solution for when every row comes back with this warning, but no solution was found for when it's only the occasional row.
How can I fix this and get rid of the warning? If I need to fix the file, how to I find each of those rows? The first row in my database doesn't seem to correspond with the first row in the file.
Best Answer
Try opening your file with a text editor, and going to the 66th line (header + 65).
You'll see the
movie_title
is:This title includes a comma, which (based on your criteria) means that the above is interpreted as
movie_title
("The Chronicles of Narnia: The Lion
) andnum_voted_users
(the Witch and the Wardrobe "
). this will push each of the following columns over by one, leaving you with an extra column in this row.And, of course, that's not a particularly useful
num_voted_users
value....However, the solution is already there in the file (for this row at least - presumably for the others). This
.csv
file appears to be formatted so that values that include the column separator are enclosed by double quotes ("
).Try running your script with the following modification:
If MySQL Workbench handles this like most systems I've worked with, it won't assume that every value is enclosed by double quotes; however, if a value starts with double quotes, it will ignore column separators until it gets to the closing double quote.
Tip: I downloaded your file, opened it in a text editor, and did the following to find lines with problems:
(FYI - I use TextPad as my editor; any good text editor should be able to do something like this, though the details may vary).