Mongodb – How to manage RAM consumption in MongoDB

indexmongodb

I am using MongoDB. I have a hardware device which continuously submits data to the database every 10 seconds for 4 hours a day. The data includes latitude, longitude, and other basic information of the device. I have indexed latitude and longitude for faster querying. I have used an event based document creation approach. That is, every time the device submits data, I am creating a document in my collection. The average number of documents created for per day will be 1,440. My document size is 300 bytes. Since I have 3 indices (including _id) the RAM consumption per document is going to be 8176*3 = ~24KB. I will have a million devices submitting data in a similar fashion. If this continues I will end up wanting 33,000 GB of RAM.

Am I assuming something wrong? What is the right approach to manage the documents in this case?

Best Answer

If your base document with all the data is 300 bytes, and worst case scenario for storage usage scenario, your indexes are all 300 bytes each but just sorted differently, would get you 900 bytes of memory used per document for indexes and 300 bytes for the base.

1440 events * 1200 bytes (Base document + 3 indexes each at 300 bytes) = 1728000 bytes

1296000 bytes / 1024 = 1687.5 kilobytes

1265.625 kb / 1024 = 1.64 megabytes

So if you are firing off 1440 events per day per device, each single device will take up 1.64 mb of memory plus a bit more for metadata overhead. If you are firing off that many events for 1,000,000 devices it will be:

1647949.21875 mb / 1024 = 1609.32 gb

so roughly a total of 1.8 TB of ram will be needed to fully fit this into memory without utilizing any compression.

You can deal with the RAM issue by scaling out in shards or finding a few very expensive systems that can handle that much RAM (note I haven't heard of one and I'm not sure how many CPU's you'd need, this isn't realistic). You can shard but then you'd typically want 3 hosts per shard and it comes with it's own operational overhead. It also eats up a lot of power an data center space.

You can compromise and just deal with the fact that you will read from disk, if your app is OK with that. Here are some other solutions.

There are all flash SANs out there that measure returns in microseconds, with RAM delivering data in nanoseconds. It will be an order of magnitude slower than RAM but it is still insanely fast and might be just fine for your reqs. The problem with these is you need top of the line Fiber Channel HBAs to have the throughput.

There are FusionIO cards that could work very well. These will give you a PCI-E bus to connect to so throughput is no problem and you don't need expensive HBAs. They aren't cheap though and if your app is very time sensitive it still isn't good enough. FusionIO will typically give you a loaner card to test and they always make me drool. They have a marketing page on it here.