Mongodb – Sharded key-value store using MongoDB

mongodbnosql

Would like to set up a key-value store that is sharded across multiple machines.

We are currently using MongoDB, is there a reason why we shouldn't use MongoDB for this purpose?

We also use Redis, however for this use case, we would like to use the hard drive and Redis is in-RAM only.

Best Answer

There are a number of reasons not to use MongoDB as a pure key-value store, and there are some reasons to consider it. Mongo is optimized as a document store - it indexes all the fields in a document, and has rich primitives for JSON objects and hierarchies. You can use it as a key-value store, but the single-threaded nature means you won't be getting good performance out of your hardware. Storing simple blobs removes a number of the benefits of Mongo. Mongo has algorithms where it splits data chunks as you insert, which can create lag. Monogo's system for re-partitioning is cumbersome, as well. The benefit of a key-value system is it should be really simple and really fast, so you can scale up and keep server and management costs down.

Other systems are more tuned for key-value use. You mention Redis, one of the best key-value stores, but the repartitioning/clustering in Redis is still alpha-level, and there is a requirement of DRAM. Some people build their own shard layers and partitioning layers on Redis - this is very common among some of the larger Chinese social networks.

Cassandra is sometimes used as a key-value store. This isn't the best use of Cassandra, as Cassandra's "super column families" provide rich indexing. Cassandra isn't as fast as databases written in C like Redis and Mongo, but does have strong clustering capabilities.

One store you should strongly consider in this area is Aerospike. Aerospike has very flexible cluster management - adding a single node by just bringing it up - as well as support for both DRAM and SSD/Flash - and easy replication for HA. It's in use at very high levels of scale by advertising platform companies who need huge key value stores. Aerospike has a free version that supports node sizes to 200G.

CoucheBase (was MemBase) is another system to look at for key-value use. It provides some clustering primitives, and is focused more around in-memory use.

Related Solutions

A Key/Value store database

Are you familiar with the concept of a Key/Value Pair? Presuming you're familiar with Java or C# this is in the language as a map/hash/datatable/KeyValuePair (the last is in the case of C#)

The way it works is demonstrated in this little sample chart:

Color        Red
Age          18
Size         Large
Name         Smith
Title        The Brown Dog

Where you have a key (left) and a value (right) ... notice it can be a string, int, or the like. Most KVP objects allow you to store any object on the right, because it's just a value.

Since you'll always have a unique key for a particular object that you want to return, you can just query the database for that unique key and get the results back from whichever node has the object (this is why it's good for distributed systems, since there's other things involved like polling for the first n nodes to return a value that match other nodes returns).

Now my example above is very simple, so here's a slightly better version of the KVP

user1923_color    Red
user1923_age      18
user3371_color    Blue
user4344_color    Brackish
user1923_height   6' 0"
user3371_age      34

So as you can see the simple key generation is to put "user" the userunique number, an underscore and the object. Again, this is a simple variation, but I think we begin to understand that so long as we can define the part on the left and have it be consistently formatted, that we can pull out the value.

Notice that there's no restriction on the key value (ok, there can be some limitations, such as text-only) or on the value property (there may be a size restriction) but so far I've not had really complex systems. Let's try and go a little further:

app_setting_width      450
user1923_color         Red
user1923_age           18
user3371_color         Blue
user4344_color         Brackish
user1923_height        6' 0"
user3371_age           34
error_msg_457          There is no file %1 here
error_message_1        There is no user with %1 name
1923_name              Jim
user1923_name          Jim Smith
user1923_lname         Smith
Application_Installed  true
log_errors             1
install_path           C:\Windows\System32\Restricted
ServerName             localhost
test                   test
test1                  test
test123                Brackish
devonly
wonderwoman
value                  key

You get the idea... all those would be stored in one massive "table" on the distributed nodes (there's math behind it all) and you would just ask the distributed system for the value you need by name.

At the very least, that's my understanding of how it all works. I may have a few things wrong, but that's the basics.

obligatory wikipedia link http://en.wikipedia.org/wiki/Associative_array

Mongodb – Are shards distributed randomly across machines in MongoDB or are they in order

In general, no they are not distributed in order. However, there is a way that this (or at least approximately this) distribution can occur during normal operations. To explain:

If you have all of your data on a single shard (a-z) and then add a second shard, the balancer will redistribute the data to the new shard by migrating chunks.

The way the balancer picks a chunk to migrate in general is pretty simple, it will pick the lowest available range on the shard with the most chunks (the "a" chunk for example), and move it to the shard with the lowest number of chunks. Rinse and repeat until the chunks are balanced.

Hence, if you have all your data on one shard and add a second shard, the low range chunks (a-m) will all be moved from the original shard to the new shard, and you can end up with this almost-in-order distribution.

In your example, you just end up with an oddly uniform distribution of data, no real harm done. However, it might be problematic from a performance perspective for range based queries across a large portion of the data. Think about a query that would walk the index in order - it would only ever be hitting one shard (new shard, then old shard) at a time rather than using the resources of both with better distribution of data.

In other scenarios this behavior can lead to a particularly poor data distribution. Let's say you had a time based shard key (bad for various reasons, but still quite common) and you deleted old data on a regular basis. Combine that with the scenario above and the new shard would eventually have no data on it (the data in the low range, old, chunks would all be deleted). Remember that an empty chunk (no data) counts just as much as a 60MB chunk in terms of balancing (the balancer is purely looking at the number of chunks on each shard, not the amount of data in them).

Also, none of this would happen if you had 2 shards to start with and then inserted your data, because the splits and migrations would be more random and more distributed in general. MongoDB will also attempt to move the maximum chunk as well to help this distribution happen.

To summarize - if you have spotted this type of distribution problem, you can take steps to fix it by moving chunks around yourself. Take a look at the moveChunk command (it's generally a good idea, but not required, to have the balancer off when you do this or you can have trouble getting the required lock).

The easiest way to view all this is to run the sharding status command and pass and argument of true, i.e. sh.status(true). This will print details of all sharded collections, where the chunks live etc. It pulls its information from the config database and uses a simple Map Reduce to aggregate the information.

Armed with that information you can also inspect the changelog collection to determine what has been migrated and when it was done (this will also be recorded in your primary mongod logs but can be tough to parse out on busy systems).

Finally, I realize this is somewhat beyond the scope of this answer, but I have written a couple of Javascript functions (that you can use from the MongoDB shell) that will let you determine the object and data distribution on your shards for a given collection, something not easily available elsewhere. Hopefully you will find them useful :)

Please Note: these functions can be intensive to run, so use with caution in a production environment.

This first function prints out the chunk information in CSV format for a given namespace. It uses the estimate option in the datasize command to use counts (rather than a scan of all the objects) to figure out the size of each chunk. Removing the estimate option would give a more accurate result (particularly if your document size varies) but if the data set is not in RAM it can be very resource intensive.

AllChunkInfo = function(ns){
    var chunks = db.getSiblingDB("config").chunks.find({"ns" : ns}).sort({min:1}); //this will return all chunks for the ns ordered by min
    //some counters for overall stats at the end
    var totalChunks = 0;
    var totalSize = 0;
    var totalEmpty = 0;
    print("ChunkID,Shard,ChunkSize,ObjectsInChunk"); // header row
    // iterate over all the chunks, print out info for each 
    chunks.forEach( 
        function printChunkInfo(chunk) { 

        var db1 = db.getSiblingDB(chunk.ns.split(".")[0]); // get the database we will be running the command against later
        var key = db.getSiblingDB("config").collections.findOne({_id:chunk.ns}).key; // will need this for the dataSize call
        // dataSize returns the info we need on the data, but using the estimate option to use counts is less intensive
        var dataSizeResult = db1.runCommand({datasize:chunk.ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:true});
        // printjson(dataSizeResult); // uncomment to see how long it takes to run and status           print(chunk._id+","+chunk.shard+","+dataSizeResult.size+","+dataSizeResult.numObjects); 
        totalSize += dataSizeResult.size;
        totalChunks++;
        if (dataSizeResult.size == 0) { totalEmpty++ }; //count empty chunks for summary
        }
    )
    print("***********Summary Chunk Information***********");
    print("Total Chunks: "+totalChunks);
    print("Average Chunk Size (bytes): "+(totalSize/totalChunks));
    print("Empty Chunks: "+totalEmpty);
    print("Average Chunk Size (non-empty): "+(totalSize/(totalChunks-totalEmpty)));
}

You can call this on any sharded namespace from a mongos as follows:

mongos> AllChunkInfo("test.users");
ChunkID,Shard,ChunkSize,ObjectsInChunk
test.users-_id_MinKey,shard0000,0,0
test.users-_id_10000.0,shard0000,525420,26271
test.users-_id_36271.0,shard0001,1120640,56032
test.users-_id_92303.0,shard0001,153940,7697
***********Summary Chunk Information***********
Total Chunks: 4
Average Chunk Size (bytes): 450000
Empty Chunks: 1
Average Chunk Size (non-empty): 600000

The next lets you find the information for a particular chunk, again using an estimate. This is somewhat safer to use without the estimate, but should still be used with caution:

ChunkInfo = function(ns, id){
    var configDB = db.getSiblingDB("config");
    var db1 = db.getSiblingDB(ns.split(".")[0]);
    var key = configDB.collections.findOne({_id:ns}).key;
    var chunk = configDB.chunks.find({"_id" : id, }).limit(1).next();
    var dataSizeResult = db1.runCommand({datasize:chunk.ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:true});
    print("***********Chunk Information***********");
    printjson(chunk);
    print("Chunk Size: "+dataSizeResult.size)
    print("Objects in chunk: "+dataSizeResult.numObjects)
}

Sample Output:

mongos> ChunkInfo("test.users", "test.users-_id_10000.0")
***********Chunk Information***********
{
        "_id" : "test.users-_id_10000.0",
        "lastmod" : Timestamp(3000, 0),
        "lastmodEpoch" : ObjectId("518373f0a962406e4467b054"),
        "ns" : "test.users",
        "min" : {
                "_id" : 10000
        },
        "max" : {
                "_id" : 36271
        },
        "shard" : "shard0000"
}
Chunk Size: 525420
Objects in chunk: 26271

Best Answer

Related Solutions

A Key/Value store database

Mongodb – Are shards distributed randomly across machines in MongoDB or are they in order

Related Question