Azure SQL – Load Data Between Temporal Tables

azuredata synchronizationsql serversql-data-synctemporal-tables

Currently we have only a Live Database where all our data resides. I have created a separate Azure server and DB for development.
The DEV DB should be some what up to date with the Live DB for developers to work on.

The Live DB has temporal or system-versioned tables. I need a process that would sync the data from the Live temporal tables to the DEV temporal tables. It should load the sysStartTime and sysEndTime that is in the Live temporal tables to dev(should not generate new sysStartTime and endTime in DEV).

What is the best way of doing this?

I already tried Azure data sync but it does NOT support temporal tables. Please help.

Best Answer

You have two interacting problems here: transferring data between Azure databases, and updating data maintained by system versioning. And possibly a third: live data in dev.

Transferring Data in Azure

If Azure Data Sync is not appropriate then there are other options that you might consider.

  • You could perhaps use SSIS packages or other ETL options to shuffle the data from A to B.
  • Linked tables are not supported in the same way as in on-prem SQL Server, but you can use "external tables" as an alternative.
  • Elastic Database Jobs offer a way to query data from one DB and write it to another, though this is more intended for logging rather than fuller data transfer so using it this way feels a bit hacky.

Updating History Tables

What-ever method you use to transfer data from one DB to the other, you have the issue of needing to directly manipulate the history data. As you have noted the system versioned tables protect themselves from this sort of modification: you can't directly modify the history table at all or specify the timings logged for modifications in the base table.

What you have to do for each table is:

  1. Turn off system versioning: ALTER TABLE schema.table SET(SYSTEM_VERSIONING = OFF); ALTER TABLE schema.table DROP PERIOD FOR SYSTEM_TIME;
  2. Make the changes you need to both the base and history tables
  3. Restart history tracking: ALTER TABLE schema.table ADD PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime); ALTER TABLE schema.table SET(SYSTEM_VERSIONING = ON); (you may need to add other options depending on how your tables are currently setup, for instance if your tables do not use the default naming convention).

So your overall process might be in four stages: bring updated data from the source, turn off history tracking, update the target data, turn back on history tracking. While those last three stages are happening you need to block other access to the target DB or corruption of the history data could result.

Alternative

Depending on the size of your databases, and the nature of your dev workflows, it may be far easier to simply backup and restore a complete fresh copy each time you need to refresh the other copies.

You may want to work with a restored copy of the database anyway, to remove the extra IO load that may slow production instances of your application, to easily make sure you get an accurate snapshot of the data without holding many locks, or to allow for a full depersonalisation process (see below).

Live Data In Dev

Unrelated to the technical matters of your question, this is likely to be problematical from a data protection and security PoV. If your application contains any personally identifying, or otherwise sensitive, information it is a big "no no".

If you do take data directly from Live systems for development then you should be "depersonalising" is on the way. This is usually easier to do reliably with the whole database backup+restore method: copy the DB in production, depersonalise the copy, back that up, transfer to the dev server, delete the now unneeded copy on the production server.

It is often better to generate sample data as part of your development process rather than trying to use production data. This way a problem in any depersonalisation processes won't expose you to security problems (and failed audits if you work with companies that require you be audited) because you never have live data to accidentally not alter. This test data can be more useful than real data if well-designed because you can make sure it contains all your expected edge cases (and some unexpected ones!) so it becomes part of your development and testing procedures.

You may need to occasionally copy production data for debugging purposes, when you have failed to reproduce a reported problem using other data, but this sort of testing is different from development and should be a last resort not a first port of call.