There is currently no built-in way to do this, so a small function is needed. For the purposes of this answer I have created a 2 shard cluster with ~1 million documents as per these instructions. Next up I used this function to examine those documents:
AllChunkInfo = function(ns, est){
var chunks = db.getSiblingDB("config").chunks.find({"ns" : ns}).sort({min:1}); //this will return all chunks for the ns ordered by min
//some counters for overall stats at the end
var totalChunks = 0;
var totalSize = 0;
var totalEmpty = 0;
print("ChunkID,Shard,ChunkSize,ObjectsInChunk"); // header row
// iterate over all the chunks, print out info for each
chunks.forEach(
function printChunkInfo(chunk) {
var db1 = db.getSiblingDB(chunk.ns.split(".")[0]); // get the database we will be running the command against later
var key = db.getSiblingDB("config").collections.findOne({_id:chunk.ns}).key; // will need this for the dataSize call
// dataSize returns the info we need on the data, but using the estimate option to use counts is less intensive
var dataSizeResult = db1.runCommand({datasize:chunk.ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:est});
// printjson(dataSizeResult); // uncomment to see how long it takes to run and status
print(chunk._id+","+chunk.shard+","+dataSizeResult.size+","+dataSizeResult.numObjects);
totalSize += dataSizeResult.size;
totalChunks++;
if (dataSizeResult.size == 0) { totalEmpty++ }; //count empty chunks for summary
}
)
print("***********Summary Chunk Information***********");
print("Total Chunks: "+totalChunks);
print("Average Chunk Size (bytes): "+(totalSize/totalChunks));
print("Empty Chunks: "+totalEmpty);
print("Average Chunk Size (non-empty): "+(totalSize/(totalChunks-totalEmpty)));
}
It's pretty basic at the moment, but it does the job. I have also added it on github and may expand it further there. For now though, it will do what is needed. On the test data set described at the start, the output looks like this (some data removed for brevity):
mongos> AllChunkInfo("chunkTest.foo", true);
ChunkID,Shard,ChunkSize,ObjectsInChunk
chunkTest.foo-_id_MinKey,shard0000,0,0
chunkTest.foo-_id_0.0,shard0000,599592,10707
chunkTest.foo-_id_10707.0,shard0000,1147832,20497
chunkTest.foo-_id_31204.0,shard0000,771568,13778
chunkTest.foo-_id_44982.0,shard0000,771624,13779
// omitted some data for brevity
chunkTest.foo-_id_940816.0,shard0000,1134224,20254
chunkTest.foo-_id_961070.0,shard0000,1145032,20447
chunkTest.foo-_id_981517.0,shard0000,1035104,18484
***********Summary Chunk Information***********
Total Chunks: 41
Average Chunk Size (bytes): 1365855.024390244
Empty Chunks: 1
Average Chunk Size (non-empty): 1400001.4
To explain the arguments passed to the function:
The first argument is the namespace to examine (a string), and the second (a boolean) is whether or not to use the estimate option or not. For any production environment it is recommended to use estimate:true
- if it is not used, the all the data will need to be examined, and that means pulling it into memory, which will be expensive.
While the estimate:true
version is not free (it uses counts and average object sizes), it is at least reasonable to run even on a large data set. The estimate version can also be a little off if object size is skewed on some shards and hence the average size is not representative (this is generally pretty rare).
The array contains 348 empty documents to begin with, and over the course of a week these array element will have sub-documents inserted (if empty) and then subsequently updated (if they already exist). The sub-documents are approximately 100 bytes in size and are not indexed.
One consideration with this use case is that your documents are consistently growing. MongoDB used a record allocation or padding strategy to allow documents to grow in-place. For example, if your document starts off as 1000 bytes MongoDB 2.6 or newer will round this up to a 1024 byte record allocation for MMAP (as per the Power of 2 Size default strategy). Updates that don't grow the size of the document beyond the current record allocation are more efficient for the server to execute.
However, if you added 100 bytes to a document which was initially 1000 bytes, the document would have to be moved to a new record allocation in storage (and associated index entries would also have to be updated). So in this example, the next allocation for a 1100 byte document would be 2048 bytes (allowing for ~9 more 100 byte fields to be added before a new record allocation was needed for this document). Indexes in MongoDB include the storage location of the document, so a document move will result in an update for every index entry referencing that document.
You can check the frequency of document moves by looking at the nmoved
value for slow updates (or by enabling increased levels of logging / system profiling). Frequent document moves can definitely have a performance impact. Common strategies include either reconsidering the data model (eg. moving the growing portion of the document to a separate collection if appropriate) or adding manual padding to the initial document allocation. The default power of 2 allocation strategy is designed to avoid the need for manual padding in most cases, but if your documents start small and grow quickly you might be able to avoid some initial document moves.
So my question is, when the document is flushed to disk, what actually gets written, the entire document or just the sub-document that has been updated?
The answer will depend on the size of your document and the nature of updates since the last background flush. I'll assume you are using a default configuration with MMAP storage engine and journal enabled.
By default data changes are written twice: once to fast append-only journal files (committed to disk every 100ms) and again to a private view in memory (flushed to data files every 60s). The background flush process is a periodic asynchronous write of all pages that have been "dirtied" in memory since the last flush. Journal commit and background flush intervals can be influenced by both server configuration and write concerns. For a good overview of the process see How MongoDB’s Journaling Works.
The MMAP storage engine will fetch the full document into memory before applying updates. The standard x86 page size is 4KiB so a single document may be represented by one or more pages -- or multiple documents may be part of a single page in memory.
So, if you are updating a single document the writes will include:
- all changes written to the journal
- all changes written to the oplog (if that node is part of replica set)
- any pages dirtied for that document since the last background flush
An important caveat is "since the last background flush". Multiple updates affecting the same pages within a given sync interval will effectively be batched.
If you're trying to get to the bottom of performance issues then consistently high background flush times (particularly as a large or increasing percentage of the default 60s flush interval) are definitely of concern, but should be reviewed in the context of other metrics such as page faults, I/O stats, and lock percentage. I would also review the MongoDB Production Notes for general tips and upgrade to the latest MongoDB production release for your major version (i.e. latest 2.6.x or 3.0.x if there's a newer x
than your current version).
Best Answer
Left to its own devices, no, MongoDB will not move those unsharded databases to a different primary shard - the automatic balancing only applies to chunks from sharded collections.
It will round robin through your shards as the databases are created to spread them out across all the shards from that perspective. If you had one shard originally and expanded to many, the databases may have been concentrated on that shard - the round robin aspect only applies when you create the database, not the collections inside it.
Once the databases are created, and assuming you can predict what will be used and when, you can then move them to whatever shard you wish using the movePrimary command and distribute load accordingly:
http://www.mongodb.org/display/DOCS/movePrimary+Command
Naturally, this will be a quicker process if there is no data in the databases, but should still be possible after the fact.