Sql-server – Data obfuscation in SQL Server

sql serversql-server-2008-r2

What is the best practice for Data Obfuscation in SQL Server?

We'd like to use masked Production data in our UAT system.

If we want to do it quickly, and with a higher level of obsfucation, what approach should be taken? I'm thinking about Character scrambling for people's given name and family name, but how? Should I create a function myself or there is any predefined functions available to use? I don't want to spend time re-inventing the wheel 🙂

How about for date fields? For example should date of birth be randomly picked from the whole table and assigned to a record, or there is a better way of doing that?

Best Answer

I wish I could upvote you 100 points just for thinking about this! I have seen this subject overlooked so many times it's untrue - so well done. From what I understand you actually want to scramble the data within the fields themselves, and although I understand what you are trying to achieve it might not be quite necessary to do so - although it should be considered on a case-by-case basis.

Most data protection laws revolve around the ability to correctly associate a piece of data with an individual - for example a date of birth or a phone number. You can meet the requirements of the law by ensuring that when you move your data out of production into UAT it is jumbled up so it is not easily re-mapped to the original person - especially when you jumble forename and surnames.

However, this does not address the issue for instance of let's say contact details. You can meet the requirements of the law by jumbling the data but the phone numbers are still real, the emails still real etc... they are just not assigned to the correct person. For this I recommend if at all possible clearing that data before passing it into UAT, Red Gate do a piece of software called Data Generator that can create random test data for you so that you can repopulate the fields with data that can be tested against.

As for data scrambling: there exists many applications that do this for you and honestly you are correct in not wanting to reinvent the wheel. The one that we use at our company is a product called Data Masker by a company called Net2000. The license is pretty cheap, it works extremely fast and you don't have to worry about having to disable all your constraints before scrambling the database.

You can of course roll your own solution should you not find anything that meets your requirements - if you do decide to do this I would strongly recommend using CLR procedures to do it as it is much more flexible than pure TSQL (not to say that you can't use TSQL see here).

Once you have chosen an application to perform this for you the next thing you need to decide is what is it you actually want/need to scramble? Honestly your best resource for this is your company legal team and or the company auditors. I know that sometimes we may not like working with them but they will be much nicer to you for approaching them and asking them the question rather than trying to do it on your own and get it wrong, there is absolutely nothing wrong with asking for help - especially when it is as important as this.

I hope this helps you and I wish you good luck in your quest... ;-)