Encryption – Best Way to Store User-Encrypted Data

dbmsencryption

I'm about to develop an application that will mainly store user-encrypted data. Each piece of data being encrypted thanks to an AES key, itself encrypted with the public RSA key of each granted user.

Given that the volume of non-encrypted data is negligible (probably only users login, some dates and foreign keys), I wonder if "standard DBMS" such as PostgreSQL or MySQL would be a good choice.

Indeed, these DBMS are optimized to handle different types of data (text, integers, dates, …), indexes, process calculus, aggregations and so on.

In my case, the vast majority of the data I will store would be large blocs of text (or maybe binary data). And most of the required calculus will be processed on the client side, after decryption.

Therefore, I won't either be able to parse my data. So even if I had a huge database, it wouldn't be qualified as "Big data". However, I wonder if MongoDb, MemSQL or whatever wouldn't be a more accurate choice. And if not, what would be the best DBMS and the best way to use it for my purpose ?

In other words, I think each DBMS have to make sacrifices in order to be more efficient on most important features. I also think the case detailed above is not that casual. So I assume there is a lot of features I don't need (or I can't use, such as FULLTEXT search for instance). On the other hand, I may need features that are discarded by "standard DBMS".

Best Answer

As a rule of thumb: If your data is well structured, well known (in advance) and of a limited size per entry (no mega BLOBs), relational databases are really good at storing it. Even if you don't use the advanced indexing features.

Managing space, especially empty space in data files, is a very hard problem to solve. Relational databases have been dealing with this for over 20 years now - and it is worth making use of them just for that. In addition to this, relational databases get you the benefits of a long history of performance tuning. They run highly optimised native code so you don't have to struggle with poor Java implementations, bad network stacks, overuse of memcpy, garbage collection, coarse locking and all the other pathologies new products (especially the noSQL stuff) tends to have.

To store encrypted data, use the BINARY data types. MSSQL, MySQL and Postgres all support these types. You may not want to do operations directly on these types (though you CAN do binary manipulation if you want to). Also note that it is relatively easy to implement the encryption/decryption in the database itself as all of the above support crypto libraries. You will also benefit from indexing on the key/foreign columns so you can find your data fast. A relational database is an excellent key/value store for small value types - SQL Server will easily get you over 1-10M key/value lookups/sec even on a small box - I expect MySQL and PostGres would deliver results in the same ballpark.

It is also easy to find programmers who can query a relational database efficiently. SQL has been around for a very long time and it is an extraordinarily powerful language. MSSQL will even give you automated parallel execution. Some programmers wont "get" it - but if they don't, chances are they also won't grok parallelism or lambda expressions (two crucial skills of a modern coder).

On top of all of this goodness, you also get powerful backup and monitoring tools for all of the standard relational databases.

So basically, unless you have a REALLY good reason to use NoSQL - just use relational databases.

Related Solutions

SQL Server Encryption – Is Data Encrypted by ENCRYPTBYPASSPHRASE() Protected by Service Master Key?

No, this is not true, and there is an easy proof. On one server, create a database, and store some data using ENCRYPTBYPASSPHRASE():

CREATE DATABASE blat;
GO
USE blat;
GO
CREATE TABLE dbo.mort(floob INT, splunge VARBINARY(64));
GO
INSERT dbo.mort VALUES
  (1, ENCRYPTBYPASSPHRASE(N'kerplunk', N'secret')),
  (2, ENCRYPTBYPASSPHRASE(N'kerplunk', N'hidden'));

Now, backup the database:

BACKUP DATABASE blat TO DISK = 'C:\wherever\blat.bak' WITH INIT;

Now, on a different server in a different domain or wherever you feel comfortable believing that there are no hidden attachments to some service master key, restore the database:

RESTORE DATABASE blat FROM DISK = 'C:\wherever\blat.bak'
  WITH REPLACE, RECOVERY, 
  MOVE 'blat' TO 'C:\somepath\blat.mdf',
  MOVE 'blat' TO 'C:\somepath\blat.ldf';

Finally, retrieve the data from the restored copy, and observe that you can still decrypt it:

USE blat;
GO
SELECT floob, prying_eyes = CONVERT(NVARCHAR(4000),
  DECRYPTBYPASSPHRASE(N'kerplunk', splunge))
FROM dbo.mort;

Results should be:

floob   prying_eyes
-----   -----------
1       secret
2       hidden

So, this means that if a user gets your data and knows the pass phrase (e.g. if you've stored it somewhere in the same database), they can decrypt all of the data without ever worrying about keys or anything else. You might infer from this that ENCRYPTBYPASSPHRASE() is not a very safe way to store passwords, unless you never store the pass phrase a user entered, or unless you use other tools and methods on top of this (scope creep).

As a side note, I will confess that I have absolutely no idea how ENCRYPTBYPASSPHRASE() actually works internally, other than the fact that it uses Triple DES (3DES) encryption. It is non-deterministic - so it might use something like NEWID() or RAND() internally - your guess is as good as mine. I do know that if you run the following, you will get a different binary value each time, and if you take any of the individual output values and ran DECRYPTBYPASSPHRASE() against that value, you'd still end up back with the original:

SELECT ENCRYPTBYPASSPHRASE(N'banana', N'turkey');
GO 5

In fact, you can try this on your own system. Here is one of the binary values I generated, and if you reverse the process, you should get turkey too:

SELECT CONVERT(NVARCHAR(4000), DECRYPTBYPASSPHRASE(N'banana', 
  0x010000007854E155CEE338D5E34808BA95367D506B97C63FB5114DD4CE687FE457C1B5D5))

Mysql – the best way to store SSH fingerprints and public keys

If I'm not mistaken, the fingerprint is a hexadecimal number. That means you could store it as a binary number, since two hexadecimal characters can be represented by a single byte, making a 32-character hexadecimal string = 16 bytes.

Here's the documentation on binary and varbinary data types for MySQL 5.7 (I don't have a test environment handy).

I do something similar for IP addresses based on this question and answer from StackExchange. Store the data as binary, convert it back to text when it needs to be displayed or compared against another string. Create a view that calls the functions as a part of it so you can easily get and store the full string when you want it.

Also, public keys are indeed public, and are often stored in searchable databases like this one.

Best Answer

Related Solutions

SQL Server Encryption – Is Data Encrypted by ENCRYPTBYPASSPHRASE() Protected by Service Master Key?

Mysql – the best way to store SSH fingerprints and public keys

Related Question