I have 1000 CSV files. Each CSV file is between 1 and 500 MB and is formatted the same way (i.e. same column order). I have a header file for column headers, which match my DynamoDB table's column names. I need to import those files into a DynamoDB table. What's the best way / tool to do so?
I can concatenate those CSV files into a single giant file (I'd rather avoid to though), or convert them into JSON if needed. I am aware of the existence of BatchWriteItem so I guess a good solution would involve batch writing.
Example:
- The DynamoDB table has two columns: first_name, last_name
- The header file only contains:
first_name,last_name
- One CSV file looks like
:
John,Doe
Bob,Smith
Alice,Lee
Foo,Bar
Best Answer
In the end I coded a Python function
import_csv_to_dynamodb(table_name, csv_file_name, colunm_names, column_types)
that imports a CSV into a DynamoDB table. Column names and column must be specified. It uses boto, and takes a lot of inspiration from this gist. Below is the function as well as a demo (main()
) and the CSV file used. Tested on Windows 7 x64 with Python 2.7.5, but it should work on any OS that has boto and Python.test.csv
's content (must be located in the same folder as the Python script):