Mongodb – How to change this relational model to fit a nosql database

mongodbnormalizationnosqlrelational-theory

Because of things beyond my control, I have to switch from a relational database to MongoDB for the advertising platform I'm helping to build. I've got a relational schema sketched out, but I'm unsure where to begin switching to nosql. I'd like to know what tables it makes sense to embed, and where it makes sense to denormalize.

Here's my current schema (as you can see, it's pretty normalized):

Users: id, name, email, paypal_email, password, balance, created_at, updated_at

Clicks: id, bid_id, server_id, origin_ip, valid, created_at

Ads: id, user_id, title, content, link, created_at, updated_at

Bids: id, ad_id, continent, current, auto, max, created_at, updated_at

Servers: id, user_id, ip, created_at, updated_at

People: id, username, ip, server_id, created_at, updated_at

Transactions: id, amount, transactable_id, transactable_type, created_at

We'll be making pretty heavy use of these tables. I expect to get many click creations, people updates, and transaction creations per second. We obviously want to be able to provide statistics like clicks per ad, impressions, etc. This information will be updated real-time on our site.

Finally, at the end of each day or two, we'll be performing large queries against clicks in relation to a couple of other tables.

Is this realistic to do with something like MongoDB?

Best Answer

"Is this realistic to do with something like MongoDB?"

If "realistic" means can be engineered to produce acceptable performance then the answer depends on whether you will do with appropriate benchmarking before putting it into production. You can replace "MongoDB" in your question with anything from Access to Oracle and the answer remains basically the same.

The MongoDB documentation has some good information on considerations for data modeling and performance. I would start there and lean towards a "collection" per each of your original source tables.

The important step is to benchmark, benchmark and benchmark. I would approach this from the standpoint of proving that MongoDB can't handle the workload. There are three possible outcomes:

  1. You prove that it won't work. Maybe it's due to technical limitations or it would require excessive HW to meet your SLA. If they proceed you can't be held responsible ("I told you so").

  2. You fail prove that it won't work and after it's in production it doesn't work then at least you tried. Everyone is surprised and there is a big scramble for an alternate solution.

  3. You fail to prove that it won't work and it works great. You will sleep good at night feeling confident in the solution forced down your throat.

Start by benchmarking the "critical paths". If you can't get MongoDB to handle the incoming click traffic then it's "case closed". Start with an aggressive goal, maybe 300% of what business users project traffic will be. Figure out how many documents/sec you can load with your initial schema. Is it reasonable? Change the schema and benchmark again. Is it better or worse? How does it scale with additional HW? Try scaling UP and OUT. Try multiple clients performing inserts. Next look at the updates. Can you meet your performance goals?

Lastly look at reporting. Reporting performance is typically an easy problem that is solved by additional hardware. If the reports are ran every day or so then you should have some leeway in expected performance.

Conclusion

There is nothing special about MongoDB compared to the evaluation of any other database technology. Start with technical requirements and benchmark, benchmark, benchmark.