Mysql – What would be most elegant way to replicate data contents periodically into a MySQL table

database-designMySQLoptimization

I need to parse contents from an API route and insert into a MySQL database. The aim of parsing contents is to replicate the data from the API to my database table.

I need to check API response every day (at a specified time) using scheduled cron jobs and should fetch and insert all available records into the database.There could be changes in the response every time and it should be reflected as provided in the API.

There isn't any fields which could be considered for making a unique ID as well.

I can't search and sort as its not an updating task alone, hence rather than 'updating' of records, we should doing 'replication' of data from the API. We need to consider,

  • deletion of removed records from the API response,
  • updating modified record fields and
  • Also the insertion of new records.

Consider an example:

On day 1,

Let the records available with the API be A1, A2, A3, A4

There could be duplicate entries among this records .

We will be parsing the API response and insert ever record available as it is. The script will fetch records (A1, A2, A3, A4) and insert it to our table.

Our table will hence contain records 🙁 A1, A2, A3, A4)

Consider on the next day,

Let the records available with API be A1,A2,A3,A5,A6 –

Possible cases:

  • There could updated field values for each records A1, A2, and A3 than
    day 1

  • A4 is removed from the API response now

  • New records A5,A6 are added now.

In this case our aim is to update the table such that, it should only have records A1, A2, A3, A5, A6 with their updated values.

The data is expected to be structured.There are specific fields from API,which are to be parsed and fetched.We can't estimate how often the changes would be made,and its asked to make update every day.

The response field values might change but not the structure.The fields for which the values should be fetched will remain same,only the change would affect the field values.

Currently there are about 2000 records ,which could increase up to 5000 soon.

There should not be any down time as the service from this table is used in real time by another application.I am parsing and inserting API response into a table which acts as a master table for another application.That application needs to work without any downtime,i.e; There should be least possible downtime even while we are doing some operations(recreating tables) over the table.

Sample API Response:

[
    {
        "company": "XYZ",
        "company_id": 123,
        "owner": null,
        "owner_id": null
    },
    {
        "company": null,
        "company_id": null,
        "owner": "ABC",
        "owner_id": 321
    },
    {
        "company": "XYZ",
        "company_id": 123,
        "owner": null,
        "owner_id": null
    },
    {
        "company": null,
        "company_id": null,
        "owner": "PQR",
        "owner_id": 100
    }
]

Database table should replicate the API response as given – irrespective of any errors/duplicates available in the API results.The API responses cannot be predicted before hand,also there isn't any direct fields which could be treated as a UNIQUE id.

I am using PHP-MySQL.API response is in JSON format. I had seen a similar question in SO,but it doesn't caters my requirement well and it also doesn't have an accepted answer.

https://stackoverflow.com/questions/1972944/what-is-the-best-way-to-periodically-load-data-into-table

From the above question it seems using some kind of temporary tables would be needed I my case as well.

What would be the most elegant way to approach this problem considering safety,performance and no downtime.What would be best MySQL storage engine to be used in this scenario-(InnoDB/MYISAM)?
Please advice.

Best Answer

If you don't need to retain the data from the previous day when you update the data, you could do this without downtime:

Day 1:

  • insert data from API into Table1

Day 2+:

  • create a new table with the same structure as Table1

create NewTable like Table1;

  • insert API data into NewTable (note: app is still using Table1 at this point)
  • switch out the tables in one statement

rename Table1 to OldTable, NewTable to Table1;

  • drop OldTable now if you want to, or keep some archives