Schema-less/flexible + ACID Database

database-designnosqlschema

I am looking at rewriting a VB based on-premise (locally installed) application (invoicing+inventory) as a web based Clojure application for small enterprise customers. I am intending this to be offered as a SaaS application for customers in similar trade.

I was looking at database options: My choice was an RDBMS: Postgresql/ MySQL. I might scale up to 400 users in the first year, with typically a 20-40 page views/ per day per user – mostly for transactions not static views. Each view will involve fetch data and update data. ACID compliance is necessary(or so I think). So the transaction volume is not huge.

It would have been a no-brainer to pick either of these based on my preference, but for this one requirement, which I believe is typical of a SaaS app: The Schema will be changing as I add more customers/users and for each customer's changing business requirement (I will be offering some limited flexibility only to start with). As I am not a DB expert, based on what I can think of and has read, I can handle that in a number of ways:

  1. Have a traditional RDBMS schema design in MySQl/Postgresql with a single DB hosting multiple tenants. And add enough "free-floating" columns in each table to allow for future changes as I add more customers or changes for an existing customer. This might have a downside of propagating the changes to the DB every time a small change is made to the Schema. I remember reading that in Postgresql schema updates can be done real time without locking. But not sure, how painful or how practical is it in this use case. And also, as the schema changes might also introduce new/ minor SQL changes as well.
  2. Have an RDBMS, but design the database schema in a flexible manner: with a close to entity-attribute-value or just as a key-value store. (Workday, FriendFeed for example)
  3. Have the entire thing in-memory as objects and store them in log files periodically.(e.g., edval, lmax)
  4. Go for a NoSQL DB like MongoDB or Redis. But based on what I can gather, they are not suitable for this use-case and not fully ACID compliant.
  5. Go for some NewSQL Dbs like VoltDb or JustoneDb(cloud based) which retain the SQL and ACID compliant behaviour and are "new-gen" RDBMS.
  6. I looked at neo4j(graphdb), but not sure if that will fit this use-case

In my use case, more than scalability or distributed computing, I am looking at a better way to achieve "Flexibility in Schema + ACID + some reasonable Performance". Most of the articles I could find on the net speak of flexibility in schema as a cause leading to performance(in the case of NoSQL DBs) and scalability while leaving out the ACID/Transactions side.

Is this an "either or" case of 'Schema flexibility vs ACID' transactions or Is there a better way out?

Best Answer

Option 1

There are several reasons for this, which I'll explain below. First, here's how to do it.

  • Use your choice of standard RDBMS platform.

  • Set up your schema with several user-configurable fields, and make your application facilitate the configuration on a per-tenant basis.

  • From the per-tenant metadata, you can create a per-tenant view of their data, which has the filters built in, and the columns named from your metadata. Any reports provided can also inherit the metadata. If they want to do M.I. off the data then provide them an extract of the transactional data, or maybe some additional MIS application on a different server if they will pay for that.

  • Don't try to provide more customisation than this (i.e. no radical changes to the schema) unless the client is prepared to pay for their own private instance and maintaing a custom build.

The reasons behind this are:

  • These database systems will handle the sort of volumes you describe on fairly ordinary hardware. You don't really have the sort of transaction volume that merits a NoSQL database. Unless you have some other architectural reason to want one, there's not much point in going bleeding-edge.

  • They're mature, well-understood technologies.

  • System management, backup/restore, replication, reporting and disaster recovery are all well sorted on RDBMS platforms.

  • You can get client libraries including JDBC for all major RDBMS platforms.

  • Views can be used for the per-user customisation and generatedfrom your application metadata.

  • It's substantially more efficient than XML fields or EAV structures.