I'm trying to import the allCountries.txt
file from http://download.geonames.org into MySQL 5.7 using LOAD INFILE
but getting the following error:
ERROR 1300 (HY000): Invalid utf8 character string:
''Afikanisitani,'Apekanikana,A Phu Han (Afghanistan),A Phú Hãn '
procedure:
– download & unzip the allCountries.zip from http://download.geonames.org/export/dump/
– create a table in mysql / phpmyadmin / MySQLWorkbench for the import:
CREATE TABLE `geoname` (
`geonameid` int,
`name` varchar(200),
`asciiname` varchar(200),
`alternatenames` varchar(4000),
`latitude` float,
`longitude` float,
`feature_class` char(1),
`feature_code` varchar(10),
`country_code` varchar(2),
`cc2` varchar(60),
`admin1_code` varchar(20),
`admin2_code` varchar(80),
`admin3_code` varchar(20),
`admin4_code` varchar(20),
`population` int,
`elevation` int,
`dem` int,
`timezone` varchar(40),
`modification_date` date,
`id` int NOT NULL AUTO_INCREMENT PRIMARY KEY
) CHARACTER SET utf8;
– run the following command to import the .txt file:
LOAD DATA LOCAL INFILE '/location/of/my/allCountries.txt' INTO TABLE geoname;
This procedure worked fine in MySQL 5.6 – but fails with the aforementioned error in MySQL 5.7.13
Best Answer
(Caution: TMI coming...)
There are definitely utf8mb4 characters in that data.
The
LOAD DATA
neededCHARACTER SET utf8mb4
.The
alternatenames
column has 46 cases of 4-byte UTF-8 codes. Here's one:I found such rows with
WHERE HEX(alternatenames) REGEXP '^(..)*F0'
The last character is HEX F0A18E9A, which needs utf8mb4.
Another example is id 281184, where
????????????
is the "Gothic" writing for Jerusalem.FYI, here is the table and the LOAD DATA I used:
Note the careful picking of ascii vs utf8mb4 for charset.
A
SHOW WARNINGS
indicated that that is still not 'correct':That can be solved via use of an @variable, and a
SET
in theLOAD DATA
.You were stopped at least by Afghanistan in Gothic:
??????????
. That's almost readable!(That was fun.)