I'm using MySQL 5.7.17 to import a ~50GB fixed width text file with over 1000 columns, but after 24 hours it still hasn't finished running. I'm using MyISAM to get around the InnoDB column limit, and also because I thought it would be faster than InnoDB for importing (source here: https://dbahire.com/testing-the-fastest-way-to-import-a-table-into-mysql-and-some-interesting-5-7-performance-results/).
Everything I'm importing is from a fixed width text file, and I'm adding all of the text file data to CHAR rows in a ~1200 column table like this:
CREATE TABLE bigFile.BigFileTest (
`var1` CHAR(3),
`var2` CHAR(5),
`var3` CHAR(12),
...
) ENGINE=MyISAM CHARSET = latin1;
LOAD DATA LOCAL INFILE
'C:\\path\\to\\file'
INTO TABLE bigFile.BigFileTest
(@row)
SET `var1` = TRIM(SUBSTR(@row,1,3)),
`var2` = TRIM(SUBSTR(@row,4,8)),
`var3` = TRIM(SUBSTR(@row,9,20))
...
;
I'd definitely prefer to not have a ~1200 column table, of course, but it's a requirement for the data I'm working with and I'm unable to change it.
Is there any way to speed up the import process? I don't have much experience with database administration, so even basic changes you're all familiar with could be things I skipped. I'm using MySQL Workbench with the defaults for everything except I specified ENGINE=MyISAM (as shown above).
Thanks for your help!
Best Answer
MyISAM is probably as fast as InnoDB for this action.
I suggest you minimize the work during the
LOAD
and plan on doingTRIM
, etc, after the data is loaded. That is, load the data into a staging table; then perform various conversions as you copy the data to the real table; then drop the staging table.This 'staging' can also give you an easy way of splitting up the columns into tables. Or consolidating the columns into a single JSON string, or something else.
If you provide more details on the data and the transformations and the 1000+, I can be more specific.
More
Oh! Sounds like Punched cards. Or mag tape.. I would
TEXT
.id INT UNSIGNED AUTO_INCREMENT NOT NULL
.SUBSTR(t, 12, 23)
clauses based on what the split up needs to be.SELECT
queries to pull the data out and put them into the other tables.Continuing with the example:
After that, it gets trickier, because you want to use cities.id in whatever table comes next, rather than repeating the state+county+city again and again.
Get the idea? Want to share the type of data you have?
Sure it will take hours, probably even days. But the structured result will be better than a nearly useless regurgitation of the raw data.
Also, think about how you want to eventually query (
SELECT
) the data.