SQL Server – Best Practices for Providing PHI/PII Data to Users in Organization

azureazure-sql-databasesql server

I am looking for best practices for how to share data from a SQL Server database that contains PHI/PII to individuals that cannot view PHI/PII. In short, we maintain a SQL server database that contains 30+ columns of PHI/PII. We need to provide datasets for certain individuals that cannot access the PHI/PII columns, but can access the other fields to conduct different types of analyses.

Current structure: The database is 100GBs and is updated 4 times per day. All data resides on Azure SQL Server. The tables should maintain metadata so blob storage is not ideal (unless someone has an idea to maintain metadata in blob storage). The data will be accessed through PowerBI or Azure Databricks.

Several options come to mind:

  1. Create a DB Role and deny access to PHI/PII columns
  2. Create a new SQL server database and an ETL that copies non-PHI/non-PII data from one database to the new database
  3. Create a new schema and then create views of the tables which do not contain PII/PHI columns and then restrict access to users for this schema only
  4. Build an script/ leverage Azure Data Factory to copy data from the database to Azure table.

Security is number one priority. Any advice is much appreciated.

Best Answer

My first choice would be to use the traditional security mechanisms, which has been in place for decades. I.e., only grant permissions on what they should see.

If they are about to write their own queries, they might be annoyed if you grant some columns but not all (1), since SELECT * will choke on that. And, yes, some will do SELECT *, whatever we say. So, (3) gets my vote.

You might run into the Dynamic Data Masking track, and this might be useful if they aren't writing their own queries. Since you say that security is #1 priority, then you probably aren't OK with them being able to deduce info. Basically, the data "is there" through the WHERE clause but not SELECT list. I recently read an article where CROSS APPLY was used as a type of brute force data attack to get the SSN even though it was masked.