MongoDB shard keeps growing, I don’t know why

mongodbsharding

I've had a project dumped on me and I'm trying to get up to speed withMongo. But I have an immediate firefight, in that one shard in my Mongo cluster just keeps growing and growing, adding more files daily, and I can't find the cause. Our cluster is simple, 12 sharded replica sets. "rs1" is the primary shard, and it's the one that keeps gobbling up more and more disk space while the other 11 shards haven't grown at all. Our data is simple too, we just create 3 collections per day of our 3 document types. And we keep those for 90 days. So roughly at any time we have about 270 or so active collections, and nightly we drop the oldest collections as new collections are created for upcoming days.

The balancer seems to be running fine, I can see the chunk counts getting balanced in the collections normally.

Is there some way I can STOP mongo from sending any more data to my problem shard while I get a handle on what's going on? I've already tried setting a maxSize, but that hasn't prevented it from still growing. Right now rs1 is taking up about 885 2gb disk files. (note that this is Mongo 2.6)

  shards:
      {  "_id" : "rs1",  "host" : "rs1/psc1c250:39080,psc2c250:39080",  "maxSize" : 1500000 }
    {  "_id" : "rs10",  "host" : "rs10/psc1c259:39080,psc2c259:39080" }
    {  "_id" : "rs11",  "host" : "rs11/psc1c260:39080,psc2c260:39080" }
    {  "_id" : "rs12",  "host" : "rs12/psc1c261:39080,psc2c261:39080" }
    {  "_id" : "rs2",  "host" : "rs2/psc1c251:39080,psc2c251:39080" }
    {  "_id" : "rs3",  "host" : "rs3/psc1c252:39080,psc2c252:39080" }
    {  "_id" : "rs4",  "host" : "rs4/psc1c253:39080,psc2c253:39080" }
    {  "_id" : "rs5",  "host" : "rs5/psc1c254:39080,psc2c254:39080" }
    {  "_id" : "rs6",  "host" : "rs6/psc1c255:39080,psc2c255:39080" }
    {  "_id" : "rs7",  "host" : "rs7/psc1c256:39080,psc2c256:39080" }
    {  "_id" : "rs8",  "host" : "rs8/psc1c257:39080,psc2c257:39080" }
    {  "_id" : "rs9",  "host" : "rs9/psc1c258:39080,psc2c258:39080" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "prodLN",  "partitioned" : true,  "primary" : "rs1" }





rs1:PRIMARY> db.stats(1024*1024*1024)
{
    "db" : "prodLN",
    "collections" : 319,
    "objects" : 301960471,
    "avgObjSize" : 3532.3332748146363,
    "dataSize" : 993,
    "storageSize" : 1626,
    "numExtents" : 4436,
    "indexes" : 634,
    "indexSize" : 25,
    "fileSize" : 1765,
    "nsSizeMB" : 16,
    "dataFileVersion" : {
            "major" : 4,
            "minor" : 5
    },
    "extentFreeList" : {
            "num" : 1820,
            "totalSize" : 81
    },
    "ok" : 1
}

Best Answer

If you have one database with hundreds of collections (it's not clear that is the case because you have not included an sh.status() or similar but that seems to be what you are implying), then here is the likely scenario:

  • The prodLN database has rs1 as its primary shard
  • Therefore all new collections created within that database will be created on rs1
  • If you then shard the collections (data is sharded on a collection level, not a database level) the data will be migrated from rs1 to other shards, but then you must rely on MongoDB to clean up the data post-delete (this also assumes a good shard key to distribute the data well across the shards)
  • You have not mentioned what version you are running, but there are lots of answers out there about deleted space reuse already (some pointers: 1 2 3), let's just say it's not very efficient for certain usage patterns (including yours)
  • Someone has added a maxSize parameter to the database, so it will be considered "full" and never be a target for chunk migrations, but that does not override the primary shard behavior when creating new databases, they still go on the primary shard, take up space, then get moved off by the balancer

What can you do? Well, you have a few options, but they require changes and/or downtime for the use of the current database:

  • Start using more than one database, make sure that the new databases have different primary shards (use movePrimary if not), start writing data there, maybe rotate around several
  • You can use movePrimary on the current database, but you have to stop writing to any unsharded collections (read the behavior section in the link above). Depending on how much data there is in unsharded collections this could take some time.
  • You can attempt to reclaim space on the problem shard to buy yourself some time. If you want to minimize downtime then I would recommend a rolling resync but this is only possible if you have 3 voting members in each set - you may only have two based on your output, unless arbiters have been excluded. You could add a third member temporarily to the problem shard to perform the resyncs, then remove it later.