SQL Server – Best Practices for Non-Production Database Refreshes and TDE

best practicessql serversql server 2014transparent-data-encryption

Say we have a production system with several databases encrypted using TDE and a self-generated certificate.

We regularly need to take this database and refresh our non-production systems. This process involves backing up the source database, restoring it to pre-production, obfuscating personal data and a number of other items.

The key fact here is that the pre-production system has a copy of the encryption certificate from production

I am looking at ways to make this process more secure – I'm not happy with the certificate sat on the pre-production environment. Is there another way that will:

  1. Never have an unencrypted backup saved anywhere
  2. Not allow propagation of personal data from production into non-production systems?

Best Answer

As Brent points out: real data (if even the smallest part of it is at all sensitive) should not be on dev/test[1] systems at all, even temporarily.

If you do depersonalise real data for use in test environments there are some precautions that we take when we do that:

  1. Again: real data should not be on dev/test systems at all, even temporarily. Restore the copy and depersonalise on signed-off production environments with the full security measures that implies, then move the munged data to your other environments. You may not want this extra load on your main production setup[2] in which case have a separate DB server in that environment just for this task[3].

  2. Make sure that your process fails safe and fails hard, even if that means it fails often. Make it fall over if any new tables or columns are present, until you give it a new list of what to expect, to mitigate against the issue Brent discusses with temporary schema changes. Also make sure any failed step stops the whole process so that any error altering the data will block the move away from production.

Best practise[4] is to generate test data based on patterns seen in production rather than taking production and trying to make it unidentifiable without altering any patterns in the data. As well as removing the risk of accidentally using real data due to a process error, you can include in your test data edge cases that have not occurred in production (yet, that you know of).

Even if your depersonalisation works, you might find that anyone with a little determination can partially undo it if they know a bit about the organisation of your clients.

the pre-production system has a copy of the encryption certificate from production

This is your red flag. Wave it at anyone above you who questions the extra expense and/or hassle of keeping the production data away from non-production environments. Unnecessary copies of keys can greatly enlarge your attack surface area.

[1] Some might suggest an exception with UAT services but I would consider that production anyway.

[2] Unless you serve a fairly specific set of timezones, for instance all your customers and the vast majority of their users are in the US, so have a large window in the night during which a performance hit won't matter

[3] It doesn't necessarily need to be expensive "production grade" kit (high performance drives, enterprise licensed SQL and OS, ...), just enough for this process to be fast enough as you won't be seeing concurrent end-user activity, but there will of course still be a cost involved

[4] Which we have moved to for all recent and future development - we only use the munging of production data like this on certain legacy services that are retiring soon enough that is isn't worth retooling before they are fully obsoleted.