Mysql – Storing dynamic data in NoSQL

database-designmongodbMySQLnosql

I have scenario where I need to store unstructured data but the rest of my data is structured and relational. An example of the type of unstructured data is as explained below:

User Type 1:

How do you blah blah : 5 fields

User Type 2 :

How do you blah blah : 3 fields

User Type 3 :

How do you blah blah : 7 fields

All 3 types are asked the same question "How do you blah blah" but each user type answers it using different number of fields. And there can be a lot of different user types.

For the relational data, I'm using MySQL but i'm a little confused on how to store this unstructured data:

  1. Serialize as JSON and store in MySQL
  2. Use NoSQL

My requirements are high reads, average updates, average inserts & no deletes. No JOINS needed. I need guaranteed writes & high availability. If I were to choose a NoSQL, it'd be a AP type according to the CAP theorem. I won't be hitting millions of records anytime soon.

I also plan to provide a text search for this data in the future, but it doesn't need to be a real time search, so I can always index the data using Lucene periodically. But of course, document based NoSQL implementations do provide this out of the box. But I have read in a few places where people have advised against storing JSON data in MySQL. But adding a NoSQL layer could be overkill.

What do I do & if you do advise me to go for NoSQL DB, which one should I choose?

Best Answer

I'll answer your question in an orthodox manner, with a twist of heresy:

Orthodoxy: You shouldn't store data in a column in a relational database that isn't "atomic."

Heresy: In your specific situation, you could -- maybe -- consider this blob of JSON to be an atomic object.

Years ago, Chris Date said it like this:

"A relation is said to be in first normal form (abbreviated 1NF) if and only if it satisfies the condition that it contains scalar values only"

Date, C.J. An Introduction to Database Systems, 6th edition (Addison-Wesley, 1995)

Later, he took a somewhat softer stance:

"1NF just means each tuple in the relation contains exactly one value, of the appropriate type, for each attribute. Observe in particular that 1NF places no limitations on what those attribute types are allowed to be."

Date, C. J. Database Design and Relational Theory: Normal Forms and All That Jazz (OReilly Media, 2012)

The "exactly one value" I'm arguing for here is "exactly one JSON object" (which could, in turn contain a JSON array).

Storing things in JSON in a column is a bad idea if you need to the DBMS to manipulate it in any way, since, of course, it can't be properly indexed like properly normalized data can be... but, arguably, if you really really really don't intend for the DBMS to do anything with what you're storing other than write and read it, the case could be made to store a chunk of JSON in a single column, claiming the JSON array of values to be a single atomic value.

The big objection, I think, to doing this, is when it's done out of a lack of familiarity with the relational model or out of laziness or naivete. Obviously, there are a lot of ways it could be done wrong, but I'd suggest that there's nothing inherently wrong about storing a chunk of JSON in a database, As Long As You Know What You're Doing.™

And, of course, you could use a MySQL FULLTEXT index on it, now that those are supported in InnoDB as of (MySQL 5.6).