MongoDB – Using MongoDB as a Data Warehouse?

data-warehousemongodbMySQLperformance

Will mongodb a good choice as a data warehouse considering following situation?

  1. We are building analytics, report generation platform for our client and a typical table growth in terms of rows will be 2 million rows every year (We need 5 yr history at max, total tables ~ 50)
  2. We have proof of concept ready using mysql and it works but my worry is as we hit production the growth in data might create issues with mysql. mysql doesn't support horizontal scaling whereas sharding in mongodb makes it scale well
  3. On average once or twice a year schema of a table changes where either new column is being added or existing one is deleted (deletion is not a concern as we can always keep the column but how difficult it would be to add new column in mysql table containing few million records?). This is other reason we are thinking about mongodb as it is flexible in terms of schema changes
  4. Writes: Average 5000 records per table per day. Reads will be less then this and it will be performed for generating analytics.

I don't have much experience dealing with this kind of data volume both with mysql and mongodb hence pardon my ignorance if some of the things mentioned above are naive.

Best Answer

(Speaking from a MySQL perspective...)

10M rows is not that big -- perhaps only 1GB? It is hardly a candidate for sharding unless the system activity is really high.

2M rows/year = 25/second. 100/second is where I perk up an eyebrow and suggest looking more closely. 1K/sec is likely to need some special treatment.

ALTER TABLE .. ADD COLUMN .. -- There are techinques that make that not too bad. And with MySQL 8.0, there is a truely painless ADD COLUMN.

Consider having JSON column for "miscellany" that gets added later; do not add more columns unless you need the database engine to filter on them or sort on them.

For analytics, build and maintain Summary Tables. These make "report writing" 10x faster than scanning through 10M rows.

CTE and Windowing functions exist in MySQL 8.0 and MariaDB 10.2. But you may not need such.

5K Writes/day -- In addition to the 2M inserts/year? Not a big deal.