Strategy for regularly updating datasets from excel files

database-design

I have ~10 excel files which are produced by a third party and updated each night. I've have a script that downloads them to my local machine. They contain ~ 10 fields (all short text / dates) and between ~10,000 and ~1m rows in each.

I'm planning to create a simple web application to enable people to search the data. I'll host it on AWS or similar. Search load will be light maybe ~1000 searches / day.

I have to assume that all the records are unique each night and need completely replace the online dataset.

It's relatively simple for me to convert the data from the excel files into a database such as Postgres and create a simple search on top of it.

My question is how do I deal with the time it takes to do the database update each night? Should I create two databases and have my application alternate between them every other night?

What is a typical strategy for dealing with a situation like this?

My current skill set is Ruby/Rails/Postgres building and simple web apps. But I've been intentionally vague about technology because I'm open minded about what to use. And I'm quite happy to learn something new to solve the problem.

Best Answer

One method is to have two sets of tables (This1, This2, That1, That2, etc.), or a set of tables in separate databases/schemas. Either way, you now have SetA and SetB. Have a set of views that access one set of tables, say SetA. Perform the nightly imports to SetB. When the new data is ready to go, recreate the views, pointing them now to the SetB tables. The next night, use SetA to import.

The views will act as a software switch with an unnoticeable switch-over rate. And it should be dead simple to add this to existing Ruby script to coordinate all of it.