Migrate small sets of data from production to staging/sandbox

developmentmigration

I have a development environment that consists of a production, staging and development servers that host SharePoint 2010 or 2013 as well as SQL Server 2008 R2. These each need to have relevant data, though there is limited bandwidth and hardware price restrictions that we must abide by.

Is there any way we can take a small portion of production data, and transfer it to our staging and development environments? And by small portion, I mean, say, the last month of data. So the storage impacts will be minimal as well as time consumption to do additional backups and/or transfer times. Our production database is a couple terabytes.

Possible restrictions:

  • Limited bandwidth for transfers between domains (prod -> dev)
  • Huge database sizes
  • SharePoint: simply keeping the schema and having the last month of updated data will cause a lot of orphaned / corrupted data, so there has to be some kind of seeded data to keep integrity

The overall goal is to reduce the database size of development and staging servers, while being able to capture the most recent data.

Preferred solutions; scripts or software.

Best Answer

The first thing you need to identify in your db are which tables require all their data (the seed data, sometimes called "domain" data), and which tables are suitable for bringing only recent data. You have a couple choices then.

1: Use a tool like SSDT to deploy your SCHEMA to dev & staging, keep the domain data as a bunch of INSERT statements in SSDT and apply those, and then write a custom SSIS package (or use the data transfer wizard, which is just SSIS) to pull only the useful, recent data over the wire.

  • PRO: this will definitely meet your requirement to pull only the minimum necessary data.
  • CON: lots of queries will go over the wire on SSIS, have to develop separate packages for pretty much every table. Have to keep the packages up-to-date as schema changes hit your db.

2: Backup the whole database, restore it somewhere else, and run a purge script. Then backup the purged db and restore this to dev & staging.

  • PRO: a pretty simple solution, requiring only DELETE scripts for the older data.
  • CON: requires another server with enough space to restore your full backup, lots of time taken in first restore, probably time taken in delete.

3: Partition your big tables so that data older than a month goes to a new filegroup, and use the Piecemeal restore option to only restore your PRIMARY filegroup.

  • PRO: A very simple restore process.
  • CON: Requires design work before the fact to partition the tables.

In our production environment, we have 1.5TB database, and we use option 2. I have tested option 3 with smaller db's and it does work rather well. Option 1 requires no changes to your db, but seems like the most work to me.