Mysql – How to keep database size low in MongoDB

database-sizemongodbMySQL

Newbie here.

TL;DR : What is the best way to compact data in MongoDB? Will it use much more space than other database systems?

I wanted to create a database for something more or less like a forum. I started learning MySQL, but I don't need a RDBMS for what I want (most of my relations are 1 to 1 (at most, 1 to Many), even if I have a little data redundancy I'd rather store the information as field-and-value pairs of a document, I think it's faster for JSON than the many table joins in MySQL I would otherwise need). I also prefer JavaScript to SQL.

But looking now at the storage requirements, I find nothing like a TINYINT, CHAR(3), etc. I wanted to keep costs down with server's space, but at the same time record a lot of information (time, ratings, votes, comments, tags, karma, favorites, etc.). I hope I'm wrong, but MySQL seems to waste reasonably less space (although I imagine this is compensated in perfomance by MongoDB).

What are the best practices for reducing size in MongoDB? And am I completely wrong in thinking MongoDB uses more space than MySQL and other DBs (disregarding the benefits of joining tables in MySQL to avoid data redundancy)?

So far, these have been the best articles I found on it:
How to limit MongoDB database size (force size limits on the database)
https://www.compose.com/articles/sizing-and-trimming-your-mongodb/ (cap database and release prefetched space)
https://docs.mongodb.com/manual/reference/command/compact/ (use compact, specially on WiredTiger DBs)
https://stackoverflow.com/questions/2966687/reducing-mongodb-database-file-size (repairing data, and same as above)
As @Fyodor Glebov reminds in his answer, there are also 2 compression algorithms: snappy and zlib.

Thanks for any advices!

Best Answer

Because data is not normalised the storage requirements generally bigger.

Don't forget about compression. When you use gzip it will use more CPU power, but disk size is much smaller than snappy. It depends on your data.

Some MongoDB consultant made benchmarks about different compression algorithms, but this file not public. Maybe some MongoDB guy provide an improved answer ;)

  • No compression
  • Snappy (enabled by default) – very good compression, efficient use of resources
  • zlib (similar to gzip) – excellent compression, but more resource intensive

see the startup parameter storage.wiredTiger.collectionConfig.blockCompressor

Default: snappy

New in version 3.0.0.

The default type of compression to use to compress collection data. You can override this on a per-collection basis when creating collections.

Available compressors are:

none snappy zlib storage.wiredTiger.collectionConfig.blockCompressor affects all collections created. If you change the value of storage.wiredTiger.collectionConfig.blockCompressor on an existing MongoDB deployment, all new collections will use the specified compressor. Existing collections will continue to use the compressor specified when they were created, or the default compressor at that time.