Sql-server – Best Approach for Mirroring Data from SQL Server to MySQL

mirroringMySQLreplicationsql server

I need help in regards to replicating data from a third party point-of-sale system with a SQL Server backend to a MySQL DB.

Specifically, can anyone suggest a best way to mirror data from SQL Server on our MySQL instance?

Current process:

  • Select a dataset from the distant SQL Server and write to a text file using PHP.
  • Import the text file using LOAD INFILE using PHP.
  • Some data sets must be chunked into several iterations.

Drawbacks:

  • It is slow and cumbersome.
  • Requires polling their server every x minutes for changes.
  • Difficult to detect changes as most tables do not have an 'updated time' on the row to indicate a change. We therefore have to sometimes assume we will need the whole table.
  • Tables are up to 11 million records (and growing).
  • Bandwidth is being used inefficiently.
  • Whenever the schema changes (short notice) on the SQL Server, we have to react to the changes and change every query.

Additional Circumstances:

  • Our third party SQL Server host will not allow any changes to the system on their end.
  • Our side (MySQL) is more flexible to system changes.
  • We cannot change to a SQL Server platform as we are a LAMP shop.
  • We are very capable IT developers, but this issue seems to warrant additional advice.
  • Management is against an organic point-of-sale.

Best Answer

Does the data actually need to be replicated in the same model?

What will you be doing on the MySQL side with the data?

I would strongly consider putting either another database or instance which you control on the SQL Server side to be able to take snapshots on that side so you pull less data down (comparing the snapshot to live data should be quick) and also look at what you want to do with the data on the MySQL side, since it is possible that you actually want to transform the data during the extraction to better facilitate the load.

I think you will find that having a sandbox database or instance which you do control collocated with the vendor database will give you a tremendous amount of power, while still allowing the vendor to operate without interference.

This can also help in reducing the bandwidth (unneeded columns, denormalizing usually increases data size, but not always) and reducing the load times or the complexity of the load process. A different model on the MySQL side which corresponds to the queries you will actually be running can also improve performance of reporting or even the load.

Having views against the source data can also help mitigate against changes - although you might not be able to use DDL triggers and schemabinding to stop changes from breaking dependencies, you could possibly catch schema changes when the extraction fails instead of after the extraction has neglected to bring down some new columns.