Postgresql – encrypt data without performance issues

encryptionpostgresqlSecurity

I've googled around for the last 45 minutes or so and it seems like that I can't figure out what kind of options I got when it comes up to encryption in databases.

Problem:

I got a SaaS product and the data is kinda sensitive.
I want to give my customers 100% security on the fact that I can't read their data.

Is this even possible?
What kind of possibilities do I got?

Best Answer

I got a SaaS product and the data is kinda sensitive. I want to give my customers 100% security on the fact that I can't read their data.

If it's software as a service, and you do more than simply store blobs of data then give those blobs back when the customer asks, this is pretty much impossible.

You're dreaming. You can't realistically combine SaaS and the inability to see the customers' data.

(About the only outfit I know that does so is SpiderOak, and it does so by supporting almost no features - it stores stuff, and then it retrieves that stuff. That's it.)

Is this even possible? What kind of possibilities do I got?

Few.

You can't index encrypted data unless you know the key - and the index will contain decrypted values.

You can't write a WHERE clause that filters based on the encrypted values unless you know the key. You could get the client to supply a pre-encrypted key to compare if it's a simple equality test, but you can't do b-tree index scans (requires > and < operators), or much else of interest.

You can't aggregate values unless you know the key.

You really can't do anything useful with them unless you know the key.

In theory you can use homomorphic encryption to allow computations on encrypted values. In practice it's incredibly limited, slow, and impractical, as well as weaker than other crypto. In practice most real-world homomorphic systems are useful for sums and that's about it.

Related Solutions

Postgresql – Database encryption and key management with pg_crypto

Note that I am not a crypto systems or security expert. This is what I've seen done and it makes sense, but I cannot claim it doesn't have security issues beyond what I outline below. As always, get a proper security audit if it's important.

In the following, "credential" can refer to ssh keys, passwords, or whatever needs storing.

The typical model is to use a couple of levels of crypto.

For each user, generate and store a public/private keypair
Store the public part of the user's key in the clear in the user's record
Store the private part of the user's key encrypted to the user's passphrase in the user's record.
Encrypt each credential to a symmetric key that's unique to that credential
Encrypt each credential's symmetric key to the public key of every user who should have access to that key, and store the resulting encrypted material in a user_credentials mapping table.

Essentially, you maintain a keyring, much like the gpg keyring where the public part is in cleartext and the private part is encrypted to a passphrase. You should make sure the passphrase selected by each user is strong.

Now you can generate a symmetric key for each encrypted credential. Encrypt this key to every user's public key and store it in a user-credentials join table. Never store this symmetric key unencrypted, and use a different symmetric key for each credential.

You land up with something like this

[user]
user_id
username
login_password_salt
login_password_hash
public_key_text
private_key_sym_encrypted_to_passphrase

[user_credential]
user_id
credential_id
credential_key_encrypted_to_user_public_key

[credential]
credential_id
credential_data_encrypted_to_credential_key

There are quite a few advantages to doing things this way:

If a dump of your database is leaked your credentials are still safe, or as safe as your passphrases and key strengths.
Since you have a different symmetric key for each credential you can control which users have access to which credentials
Any user with access to a given credential can give another user access to it; they just decrypt it using their key and encrypt it to the other user's public key. You don't need to get everybody together to add credentials or add new users.
Because you're encrypting each credential to a symmetric key, and then encrypting that key to the user's key, you can change the stored credential without having to re-encrypt it to each user's key. It just makes updating stored credentials more convenient.

However, if the database host is compromised, an attacker can easily extract the passwords any given user has access to as soon as that user logs in. They capture the user's passphrase by wrapping functions or by enabled detailed logging and searching the logs. Then they can decrypt passwords in a stolen dump or from the live DB. A compromised database host is no danger until a user logs in to use it but then all is lost.

This means that the host of the database is security critical. Leaking the dump isn't fatal to security, but someone being able to modify the code running on the DB is a fatal security breach.

In addition to using your credentials management DB to extract ssh key material, passwords, etc when you need it, you can also use it as an authentication proxy where it can auth against a service without your user having to ever be able to learn the credential used. For example, your app might decrypt an ssh key, add it to an ssh agent, and use the decypted key to log in to a server via the agent without the user ever being able to see and access the key directly.

Sql-server – Best practices to encrypt all the data

Your problem boils down to access control.

The first defense I'd propose is to simply deny access to the untrusted users. If they can't get into the database, they can't query the database and get at the sensitive data.

If they must be allowed to access the database server, you can look at either explicitly granting them read permission to the tables they must access to perform their job tasks. Alternatively, you can leave their current permissions alone and simply revoke their ability to query tables or columns within those tables with sensitive data.

The reasons I'd start with this approach is that there is overhead to encryption. There is key management as well as the CPU cost of encrypting/decrypting the data. As I understand it, there's also going to be a performance cost of looking up data as the encrypted data if it's in an index, the encrypted value will be indexed and not the source value. The net result is that 2013-06-11 and 2013-06-12 could have been stored in contiguous disk locations but once they're encrypted they could be on opposite ends of the disk and your simple range query which used to perform well now sucks hind teat.

All of that said, Steve Jones had a good presentation on the Encryption Primer. I used some of that content plus this article on encryption in the DB but not in the cube to get our encryption stuff up and running.

As others have pointed out in the comments and other answers, despite having Encryption as part of its name, TDE, isn't going to protect the data in the database itself. The purpose of TDE is to protect your backups from unauthorized access. It has nothing to do with encrypted packets flowing across the network or automagically creating PCI compliance. It simply ensures that if our offsite backup tapes are lost, stolen or simply snooped, people will not be able to use them without the key.

The other point to access control that I failed to mention is physical access. If the concern is the junior admin accessing data they shouldn't, then in addition to denying them access to the database, don't overlook basics of preventing them from accessing the machine itself---either through remote desktop or physically logging onto the machine. Once they are able to get onto the box, there's nothing preventing them from restarting SQL Server into single user mode and removing your access controls.

Best Answer

Related Solutions

Postgresql – Database encryption and key management with pg_crypto

Sql-server – Best practices to encrypt all the data

Related Question