Managing schema changes between transactional and analytical databases

migration

How would you manage changes in a transactional database that affect an analytical database?

Environment:

  • Transactional database is a current, or near current version of MySQL. Size of the transactional database appears to be around a few terabytes, but the growth of the database is just a few hundred megabytes a day.
  • Analytical database is a current, or near current version of Mondrian

Scenario: Given an analytical database that is automatically loading data from a transactional database — when there are schema changes that result in large segments of the data being repositioned within the database transactional database that will result changes in the analytical database too. The only example I was given was that small portion of the schema for the transactional database changed in the database which to my understanding was about 300 tables; meaning I'd guess that 5-10% of the schema for the transactional database changes in some way either by adding/removing/moving column/tables/rows, and that in some cases there are global changes to how the data is formatted within the transactional database.

related meta question for this question.

Best Answer

You won't be able to automate this process. Any release of the base system will need to include an impact analysis and change to the analytic system, and this will be a manual process. You will have to update the ETL process to source data from the new structure and possibly modify the data mart as well.

This will add latency to the release process, and I've seen plenty of occasions where this was neglected and the analytic system broke. The business or owners of the transactional system will have to do one of three things:

  1. Incorporate impact analysis on the analytic system and ETL processes into their change control for the operational system.

  2. Accept that releases will sometimes break the analytic system, possibly in ways that are not obvious (unrecognised semantic changes to the data). Fixes to the analytic system may or may not be quick to implement. It is possible they could do somethng that takes days or weeks to fix in the analytics.

  3. Build a stable interface layer that the system exports data into and refactor the data mart ETL to populate via that interface.

Option 1 is almost always preferable unless the business cannot tolerate delay to the releases. This might be the case with a .com type business where fast release schedules matter.

In the case where that impact on the release is not acceptable, then the business must either implement (3) or sign off that they are happy with (2) and accept responsibility for any downtime caused by changes that break the analytic system.

Option 3 will bog down changes to the analytic systems as they may become dependent on changes to the interface and therefore require releases of the operational system with changes to populate the updated interface.

If the business insists on sticking with (1) then make them sign off something accepting responsibility for unscheduled downtime on the analytics. If you don't have a paper trail showing that you've raised this issue with them then you're at risk of copping the blame for something you have no control over.