1 - You can have more than 1 mongos instance and connect to whichever you want (the client driver should have that option). Nevertheless, a mongos is just a router, meaning it will only route the requests to the correct shard(s).
2 - Yes, a config server can be in the same machine as a primary/secondary, just don't put config instances together (you are required to have 3 because of redundancy).
3 - A replica set is a group consisting of 1 primary and N secondaries. If you are sharding, each group will be a shard. That means that the single primary and all secondaries of the same group will have the same data (replicated). Also, having more secondaries won't help you increase the write performance as only the primary is able to do perform inserts/updates/deletes. Considering this, the only way to increase the write performance is by having more shards (horizontal scaling) and, of course, choosing a good sharding key that will balance data across all shards (you don't want all your data to be in 1 shard and all others remain empty). For further info check this official doc explaining the concept of replica set and its members http://docs.mongodb.org/manual/core/replica-set-members/
Also, instances of a replica set shard (the primary and N secondaries) are not required to be in the same machine. You might want to put them in separated machines to increase redundancy and perhaps have a better distribution of the load.
Lot to go through here, so I'll take it piece by piece, first off splitting:
I thought this meant that when a chunk hits 64mb, it splits into two
equal chunks both of size 32mb. That's what is demonstrated here. Is
that not correct?
That's not quite how it works. If you have a 64MB chunk and you manually run a splitFind command, you will get (by default) 2 chunks split at the mid-point. Auto splitting is done differently though - the details are actually quite involved, but use what I explain as a rule of thumb and you will be close enough.
Each mongos
tracks how much data it has seen inserted/updated for each chunk (approximately). When it sees that ~20% of the maximum chunk size (so 12-13MiB by default) has been written to a particular chunk it will attempt an automatic split of that chunk. It sends a splitVector command to the primary that owns the chunk asking for it to evaluate the chunk range and return any potential split points. If the primary replies with valid points, then the mongos will attempt to split on those points. If there are no valid split points, then the mongos will retry this process when it the updates/writes get to 40%, 60% of the max chunk size.
As you can see, this does not wait for a chunk to reach the max size before splitting, in fact it should happen long before that and with a normally operating cluster you should not see such large chunks in general.
What's up with this? How can the first 3 shards/replica sets have an
average size greater than 64mb when that is set to be chunkSize? Rs_2
is 119mb!
The only thing preventing large chunks from occurring is the auto-split functionality described above. Your average chunk sizes suggest that something is preventing the chunks from being split. There are a couple of possible reasons for this, but the most common is that the shard key is not granular enough.
If your chunk ranges get down to a single key value then no further splits are possible and you get "jumbo" chunks. I would need to see the ranges to be sure, but you can probably manually inspect them easily enough from sh.status(true)
but for a more easily digestible version take a look at this Q&A I posted about determining the chunk distribution.
If that is the issue you only really have 2 choices - either live with the jumbo chunks (and possibly increase the max chunk size to allow them to move around - anything over the max will be aborted and tagged as "jumbo" by the mongos), or re-shard the data with a more granular shard key that prevents the creation of single key chunks.
Rs_2 has 27.53% of the data when it should have 16.6%.
This is a fairly common misconception about the balancer - it does not balance based on the data size, it just balances the number of chunks (which you can see are nicely distributed) - from that perspective a chunk with 0 documents in it counts just the same as one with 250k documents. Hence, the reason for the imbalance in terms of the data is because of the imbalance in the chunks themselves (some contain a lot more data than others).
What should I do here? I can manually find chunks that are large and
split them, but that is a pain. Do I lower my chunkSize?
Lowering the chunk size would cause the mongos to check for split points more frequently, but it won't help if the splits are failing (which your chunk size averages suggest is the case), it will just fail more often. As a first step I would find the largest chunks (see the Q&A link above) and split those as a priority first.
If you are going to do any manual splitting or moving, I recommend turning off the balancer so that it is not holding the meta data lock and does not kick in as soon as you start splitting. It's also generally a good idea to do it at a low traffic time because otherwise the auto-splitting I described above could interfere also.
After a quick search I don't have anything generic immediately to hand, but I have in the past seen scripts used to automate this process. It tends to need to be customized to fit the particular issue (imagine an imbalance due to a monotonic shard key versus an issue with chunk data density for example).
Best Answer
Yes, the collection is available during the process. Actually, that command sh.shardCollection will take only milliseconds to execute, but actual balancing work will take hours or days to be ready. Balancing will not lock collection, it's just moving chunks from one shard to other one.
Just be careful when selecting sharding key, you should have good knowledge of basics of sharding and how it works.