Apply in a CSV file to multiple databases

application-designcsvimportupdate

Recently, I have been working with a client whom has asked us to provide their dataset for use in a mobile application we are making for them. The requirements for this project include the ability to update the local dataset on the mobile devices with any changes that our client delivers to us. The local databases are to be stored in sqlite according to the mobile device standard.

The Problems:

  1. The Dataset: Our clients can only deliver to us an entire dump of their database in CSV format. We must be able to apply only the changes between the new CSV and our existing databases without causing duplicate entries or stale data.
  2. The Delivery: Only the changes to the dataset may be sent to the mobile databases. Resending of the entire database is too expensive.

Approach:

Currently, the solution we are looking into involves using one of our backend MySQL servers to maintain a backup of database stored on the mobile sqlite databases. When a new dataset arrives, we were planning to make a new database with the new data, and then to be able to use diff and patch linux commands to update the server side data. However, with this approach, knowing what or how to update in the mobile phone's local databases gives us some issues. I am currently unaware of any way to use the patching or the diff file from MySQL in either the Android or iOS environments.

Ideal Approach:

Preferably, when we receive the entirely new dataset, we would like to be able to detect just the changes in the new CSV against the current state of the database. Then, with this list of changes, we would generate INSERT/UPDATE/DELETE statements that would be able to apply each of the changes to both the MySQL and sqlite databases. We would store a list of each of the INSERT/UPDATE/DELETE statements along with a the date updated so that upon request each of the phones can be updated accordingly to the newest data.

However, I am also currently unaware of a way to detect changes from a new CSV to an existing database which makes constructing INSERT/UPDATE/DELETE statements difficult.

The Questions:

  1. What is the best way to get the changes between the CSV file and our current database implementation in a portable and meaningful format?
  2. What is the preferred method for cascading such changes from our server side database to the phone's local databases?

I am currently seeking any advice for either improvements to our design or proposed implementation. I feel like I can't be the first person to try and reflect changes to a CSV in a database or synchronize changes in a server side database to a local offline database.

Thank you everyone in advance for your help. I look forward to hearing alternative approaches to this problem.

Best Answer

Load the CSV file into a "staging" table. From there you can easily do the UPDATE/INSERT/DELETE in a single statement for each one

Something along the lines:

update real_table
  set ...
where exists (select 1 from staging_table where ...)

insert into ...
select 
from stage_table
  left join real_table 
where real_table.some_col is NULL

delete real_table
where not exists (select 1 from staging_table)

Then delete the data from the staging table.

Probably the fastest way to do it. The only drawback is that you have the data for one table twice in the database (until it is deleted from the staging table).