Mongodb – Ideal environment and cost for running a medium size MongoDb server

mongodb

I know some people will say this question does not belong here but i didnt find a better place on the internet to ask this. Please migrate to an appropriate place if you think it should be.
I am using MongoDB to store product data. The text data(product names,categories,attributes) account for about 30 GB. The images account for about 200 GB. I am paying 184 USD for a quad core processor and 8GB Ram and 500 GB HDD dedicated server(from SingleHop). Is this setup ok for a medium traffic site(5-7K visitors a day ). I have heard that MongoDB best works when it is sharded(because of the global read write lock) but i have never worked with clusters. I have always worked with VPSs. This is my first time with a Dedicated server. Please advise me about the setup and the costs too. I live in India and expect most of my visitors from India.

Thanks.

Best Answer

There really is no "ideal" environment and cost for running MongoDB (or any other database for that matter). There will be very cheap solutions that will give you enough space, but not enough RAM, there will be middle range options where you have enough RAM most of the time but at busier periods you exceed the memory limitations and the disk is too slow to cope with the increased page fault activity.

As always, it will be a trade of between what you can afford and what is best. In terms of general recommendations:

The cores and space will matter less than your available RAM and whether or not you can keep your working data set (active data plus indexes) in RAM - that is the key to performance. You won't really be able to tell until you start with real traffic, but if you have decent testing you should be able to estimate it.

I would recommend using MMS to track the stats, it's free and it includes a memory graph that will track your resident memory usage and many other things.

FYI - there is no global read lock and as of 2.2 (release candidate is up for testing as of writing this), the global write lock has been replaced by a database level lock. Have a look at the relevant concurrency presentations for an in-depth discussion on the 10gen website.

Another thing to make sure of is that you have more than one MongoDB instance, it is highly recommended that you run a replica set (primary, secondary, arbiter minimum) and not a single instance.

Sharding can be used to help scale out horizontally though - that is correct, it allows you to add more resources to your cluster without having to increase the resources available on a particular host. However, it is not really correct to say that MongoDB runs "best" when sharded - sharding has overheads (you need more servers to run the config DB, mongos processes etc.).

MongoDB runs best when your working set can fit in RAM and your disk subsystems are fast enough to keep up with the amount of data you want to write to disk. Whether than requires a single replica set or a sharded environment will very much depend on how you use it.