MongoDB – Repair Database on Replica in Sharded Setup

mongo-repairmongodbsharding

I am using mongo sharding in my DB. The scenario I am facing is that I have deleted some of my collections in my DB. But as to reclaim the space I need to run repair option which i can not run using Mongos in production.
So my plan is to run Repair on each replica server one by one by switching primary to replica and vice versa so that production systems is working.
Here I have two confusions:

  1. Is it a right approach to repair DB
  2. I have a big Data base like 1 TB of Data so repair option might take a day or two in this case my issue is every daya my application creates sharded DB bucket for that day so while running repair on one replica server metadata in config will get updated will it be a error in this case.
  3. While deleting some old databases from my cluster I stopped the balancer and restart it after delete but still DB is not completely deleted it is showing me 2 or 3 GB memory space used. I check the remaining DB and it shows no collection in it.

Best Answer

In my opinion if your fragmentation is not over 15-20% its not worth doing that (except if you are running out of disk space). What i would do is:

1) Add an arbiter to each replica set (optional)

2) Shutdown one of the secondaries and delete the data directory

3) Start the secondary and let it do the initial sync which removes fragmentation

(at this point you can evaluate if the gain on disk worth continue)

4) When Secondary catch up do the same for the next Secondary

5) When all Secondaries done, step-down Primary and do the same on the x-Primary

6) Remove the arbiter

Important: Your oplog size must be able to keep all operations during the initial sync. Else you will need to resize it (Same applies if you choose repair option)

For your second question replica set member don't hold any metadata. If the majority of Replica set is always available you will not face any issues.