Approach for a 2,000 table migration

database-designdatabase-recommendationdatabase-theory

What might be the best approach when converting or migrating a database containing over 2,000 tables? Where would you begin to tackle such an issue? Steps to take in the first weeks of designing? Risks? Assuming there is hardly any information about the old (current) database?

Best Answer

The approach I take is broadly as follows (and this is by no means complete):

Requirements You need to know the goals of the to be situation.
- specifics of platforms (hardware, software), versions of these. There may be dependencies in terms of feature availability
- features used in "from" situation and needed in "to" situation. Think of
  - queueing
  - clustering
  - high availability
  - backup and recovery
Analyse current situation for
- requirements missing from above
- DBMS or platform specific features being used
Design for an entirely automated and repeatable approach. This makes it
- testable
- able to run overnight/weekend in case of a long process
- allows for an iterative solution where the database becomes more and more complete during the life of the project
Determine dependencies between systems both on the "from" side and on the "to" side. This will affect the "flow" of the migration.
Decide how the data will be migrated. Amongst others, choices are:
- dump and reload, probably via some database agnostic format
  - if reloading, it may be easier to go via "temporary" tables and then populate the real target tables from the temporary ones. I've had good experience with this approach.
  - this also makes it easier to allow for "fallout" which can be manually corrected after cutover (but you need to keep this under control too).
- "suck" the data from the "from" database into the "to" database using some database connection technology (eg Oracle Gateway, PostgreSQL foreign data wrappers). This approach is usually also best when wanting to perform a migration where the target DBMS is the same as the source.
- decide whether to use scripting (SQL scripts mainly, but possibly also Perl/awk etc) or a tool (eg Oracle used to have a product called Migration Workbench or an ETL tool can be used for this purpose) or even "programmed" solutions in Java/C# etc. This decision will largely be driven by the skillset at hand. My preference is to use SQL wherever possible.
Build it.
- If the target data model is different, determine mappings. An ETL tool can be handy in these cases, but well written SQL can be just as clear. Like all code, try to keep the documentation on the mapping with the code rather than in a separate document.

Best Answer

Related Solutions

Scalable table structure for periodically updated stats that get aggregated over time

Mysql – Economic Database Architecture Possibilities w/ Data of Varying Length

Related Question