Mysql – Mass data upload from Excel – technology

database-designdatabase-recommendationexcelMySQLpostgresql

I am a Developer seeking help in terms of DB technology for a specific requirement. I have done some research but can not conclude with list.

Requirement:
There will be a web application which will allow user to upload Excel document in specified format(expecting 1 Million of rows) and it will then Save somewhere in DB and run some calculation against our data base values. After that will present user with some numbers/chart based on calculations.

  • My first question is MySQL DB good enough for this requirement over any other DB like Postgres?

  • Second question is how much time will it take to upload 1 million rows of data from Excel, considering there will be 6 columns?

  • Third, which technologies should I consider to make this application fast and efficient like InnoDB, etc.

Once again I have done my research but I could not reach into conclusions, and any DB expert feedback/help will be highly appreciated.

Best Answer

I can speak for Postgres. PostgreSQL is really fast to copy to if you are a super user. Postgresql has a copy to command. Merely save the file as a csv (which can also be opened in excel) and use:

COPY table FROM 'fpath' CSV HEADER DELIMITER ';'

You can copy anything with COPY (SELECT STATEMENT) TO 'fpath' as well.

If you must use excel formats like .xlsx, why not use something like Pentaho and develop an application that can connect to any database. There is a java api for this one. Other ETL tools may do the trick too.

In these instances, upload speed is under 10 minutes. My last upload over a non-commercial network of 900000 rows lasted 20 minutes in Pentaho. A straight copy to in PostgreSQL is actually much faster.

As for speed, querying is your major problem. If you are just copying and querying, Postgres is fine. However, if you are running intensive queries, counts; aggregation, use an ETL tool like Pentaho that you can run from Java and connect with a connection pool like BoneCP for Postgres or another connection pooling software for other databases.

Most databases support connection pooling but have connection limits so look at your configuration settings.