How to migrate data between “all-surrogate” databases

migration

With some periodicity this needs to be done:

  • Inserting data from a production database into a test/development database, so that programmers have fresh data to test on.
  • Migrating data from a database into another, for whatever reason.

Note: databases are relational.

The problem is:

  • Databases were modeled with a "all-PK-are-surrogate" policy.
  • Moving data from a database to another involves programming to avoid colision of PK values ( generated by sequences ).
  • A table pairing the values of the PKs of origin and destination database's tables is needed in order to do the "equivalency" between surrogate keys.
  • Said table has to be created in advance of migration, by matching business keys (which are not PK, so don't govern FKs)
  • Data migration even of a single table in not trivial, as opossed to tables with natural keys.

What's the easiest way of copying rows from one database to another when all tables have surrogates keys ?

Best Answer

We came across this problem (or at least a similar one) not long ago at work, and our solution is to divide the migration process into steps:

  • Insert the new data to a temporary staging table(s)
  • Update the surrogate keys in the staging table(s) so that there are no conflicts
  • Insert the data into the target table(s)
  • Delete the staging table(s)

The second part is the tricky part, but in our case we were able to follow this algorithm for each table:

  • If there are any foreign keys against this table, set up foreign keys in the staging table with ON UPDATE CASCADE
  • Update the primary keys of all rows to a new id range. We used negative ids. E.g. UPDATE STAGINGTABLE SET ID=-ID-1; Now there is no risk of id clashes in the next step.
  • Now, for each row in STAGINGTABLE, generate and assign a new id

Repeat for each staging table. When all staging tables are processed, INSERT the data into the target tables.

One assumption this relies on is that all data is new and independent from the data already in the database.