Mongodb – Maintaining mongodb for a huge list of message “documents”

managementmongodbperformance

Intro:

I am working on an app right now for live messaging and it's almost complete. The database strategy is in order. So when a new message is created it is sent as an object fitting my mongoDB model, nice.


Theory:

I am no expert (obviously), but I have a few things in mind. One, somehow limit the list of documents to maybe 1,000 and export the rest somewhere (not a million dollar idea, but just brainstorming). That's really it lol!


Question:

So I am asking what is the best way to manage a huge list of documents, most at the tail end never being needed (just saved to go back on some conversations). Thinking about it most old messages when you scroll far up (skype, text, etc) disappear after awhile. So I am thinking my theory isn't so bad?

Is there a standard for handling a huge list? Should I just leave it alone, I feel like it would become hard to manage?

Best Answer

A capped collection would allow you to maintain a relatively small list, though you would want to have a decent buffer, or be very confident in terms of the size of your documents. The other option would be a TTL collection, but that would be more prone to fragmentation since it will be doing a lot of deletes (this will mean having to deal with said fragmentation regularly with repair, compact or a resync).

Based on what you outline, this sounds very like what people have done when looking to implement a message queue with MongoDB - you use the capped or TTL collection for your immediate needs, then you decide what level of durability you want for the documents long term. That is, store them in another collection, database, or even another instance of MongoDB entirely before they are removed. At a larger scale I have even seen people have long term and short term shards, but that should be a long way down the line from your initial attempts.

If you do go down the path of a capped collection, with "long term" storage elsewhere, it is often beneficial to have the capped collection on one disk and the other data elsewhere (capped collections tend to have sequential access patterns, whereas regular collections tend towards random access). That means using different databases, different disk mount points, and the directoryperdb option.

There are a lot of potential options here but thankfully several of the approaches have been publicised and documented. To start I would take a look at these:

http://www.mongodb.com/presentations/mongodb-message-queue http://captaincodeman.com/2011/05/28/simple-service-bus-message-queue-mongodb/ http://blog.mongodb.org/post/29495793738/pub-sub-with-mongodb

Hopefully that's enough to get you started.