Encryption – Best Way to Store User-Encrypted Data

dbmsencryption

I'm about to develop an application that will mainly store user-encrypted data. Each piece of data being encrypted thanks to an AES key, itself encrypted with the public RSA key of each granted user.

Given that the volume of non-encrypted data is negligible (probably only users login, some dates and foreign keys), I wonder if "standard DBMS" such as PostgreSQL or MySQL would be a good choice.

Indeed, these DBMS are optimized to handle different types of data (text, integers, dates, …), indexes, process calculus, aggregations and so on.

In my case, the vast majority of the data I will store would be large blocs of text (or maybe binary data). And most of the required calculus will be processed on the client side, after decryption.

Therefore, I won't either be able to parse my data. So even if I had a huge database, it wouldn't be qualified as "Big data". However, I wonder if MongoDb, MemSQL or whatever wouldn't be a more accurate choice. And if not, what would be the best DBMS and the best way to use it for my purpose ?

In other words, I think each DBMS have to make sacrifices in order to be more efficient on most important features. I also think the case detailed above is not that casual. So I assume there is a lot of features I don't need (or I can't use, such as FULLTEXT search for instance). On the other hand, I may need features that are discarded by "standard DBMS".

Best Answer

As a rule of thumb: If your data is well structured, well known (in advance) and of a limited size per entry (no mega BLOBs), relational databases are really good at storing it. Even if you don't use the advanced indexing features.

Managing space, especially empty space in data files, is a very hard problem to solve. Relational databases have been dealing with this for over 20 years now - and it is worth making use of them just for that. In addition to this, relational databases get you the benefits of a long history of performance tuning. They run highly optimised native code so you don't have to struggle with poor Java implementations, bad network stacks, overuse of memcpy, garbage collection, coarse locking and all the other pathologies new products (especially the noSQL stuff) tends to have.

To store encrypted data, use the BINARY data types. MSSQL, MySQL and Postgres all support these types. You may not want to do operations directly on these types (though you CAN do binary manipulation if you want to). Also note that it is relatively easy to implement the encryption/decryption in the database itself as all of the above support crypto libraries. You will also benefit from indexing on the key/foreign columns so you can find your data fast. A relational database is an excellent key/value store for small value types - SQL Server will easily get you over 1-10M key/value lookups/sec even on a small box - I expect MySQL and PostGres would deliver results in the same ballpark.

It is also easy to find programmers who can query a relational database efficiently. SQL has been around for a very long time and it is an extraordinarily powerful language. MSSQL will even give you automated parallel execution. Some programmers wont "get" it - but if they don't, chances are they also won't grok parallelism or lambda expressions (two crucial skills of a modern coder).

On top of all of this goodness, you also get powerful backup and monitoring tools for all of the standard relational databases.

So basically, unless you have a REALLY good reason to use NoSQL - just use relational databases.