MongoDB sharded cluster chunks distribution

dockermongodbmongodb-3.6sharding

I have deployed MongoDB cluster with two shards (two replica sets) with docker-compose.

Unfortunately I'm having problems with chunks distribution between shards, I'm new to MongoDB and database sin general so those problems can be result of me misunderstanding the topic. The questions describing problems are at the bottom of post, below is configuration of my sharded cluster.

Sharded cluster

It consists of:

  1. Config replica set, each of three mongod instances configured with: mongod –configsvr –replSet configReplicaSet –bind_ip 0.0.0.0 –dbpath /data/db –port 27019
  2. First replica shard of three mongod instances, each configured with: mongod –shardsvr –replSet shardReplicaSet1 –bind_ip 0.0.0.0 –dbpath /data/db –port 27017
  3. Second replica shard of three mongod instances, each configured with: mongod –shardsvr –replSet shardReplicaSet2 –bind_ip 0.0.0.0 –dbpath /data/db –port 27017
  4. Single mongos instance, it's container starts with /bin/bash despite command in docker-compose, it's duo to fact that the entrypoint defined in docker image is /bin/bash so it probably overrides docker-compose
  5. Single mongo shell instance.

Here is docker-compose.yaml:

version: '2.1'
services:
  mongors1-1:
    container_name: mongors1-1
    image: mongodb-server-shard
    command: mongod --shardsvr --replSet shardReplicaSet1 --bind_ip 0.0.0.0  --dbpath /data/db --port 27017
    ports:
    - 27011:27017
    expose:
    - "27017"
    networks:
      default:
        ipv4_address: 172.19.0.11
    stdin_open: true
    tty: true

  mongors1-2:
    container_name: mongors1-2
    image: mongodb-server-shard
    command: mongod --shardsvr --replSet shardReplicaSet1 --bind_ip 0.0.0.0  --dbpath /data/db --port 27017
    ports:
    - 27012:27017
    expose:
    - "27017"
    networks:
      default:
        ipv4_address: 172.19.0.12
    stdin_open: true
    tty: true

  mongors1-3:
    container_name: mongors1-3
    image: mongodb-server-shard
    command: mongod --shardsvr --replSet shardReplicaSet1 --bind_ip 0.0.0.0  --dbpath /data/db --port 27017
    ports:
    - 27013:27017
    expose:
    - "27017"
    networks:
      default:
        ipv4_address: 172.19.0.13
    stdin_open: true
    tty: true

  mongors2-1:
    container_name: mongors2-1
    image: mongodb-server-shard
    command: mongod --shardsvr --replSet shardReplicaSet2 --bind_ip 0.0.0.0  --dbpath /data/db --port 27017
    ports:
    - 27021:27017
    expose:
    - "27017"
    networks:
      default:
        ipv4_address: 172.19.0.21
    stdin_open: true
    tty: true

  mongors2-2:
    container_name: mongors2-2
    image: mongodb-server-shard
    command: mongod --shardsvr --replSet shardReplicaSet2 --bind_ip 0.0.0.0  --dbpath /data/db --port 27017
    ports:
    - 27022:27017
    expose:
    - "27017"
    networks:
      default:
        ipv4_address: 172.19.0.22
    stdin_open: true
    tty: true

  mongors2-3:
    container_name: mongors2-3
    image: mongodb-server-shard
    command: mongod --shardsvr --replSet shardReplicaSet2 --bind_ip 0.0.0.0  --dbpath /data/db --port 27017
    ports:
    - 27023:27017
    expose:
    - "27017"
    networks:
      default:
        ipv4_address: 172.19.0.23
    stdin_open: true 
    tty: true

  mongocfg1:
    container_name: mongocfg1
    image: mongodb-server-shard
    command: mongod --configsvr --replSet configReplicaSet --bind_ip 0.0.0.0  --dbpath /data/db --port 27019
    ports:
    - 2001:27019
    expose:
    - "27019"
    networks:
      default:
        ipv4_address: 172.19.0.101
    stdin_open: true
    tty: true

  mongocfg2:
    container_name: mongocfg2
    image: mongodb-server-shard
    command: mongod --configsvr --replSet configReplicaSet --bind_ip 0.0.0.0 --dbpath /data/db --port 27019
    ports:
    - 2002:27019
    expose:
    - "27019"
    networks:
      default:
        ipv4_address: 172.19.0.102
    stdin_open: true
    tty: true

  mongocfg3:
    container_name: mongocfg3
    image: mongodb-server-shard
    command: mongod --configsvr --replSet configReplicaSet --bind_ip 0.0.0.0 --dbpath /data/db --port 27019
    ports:
    - 2003:27019
    expose:
    - "27019"
    networks:
      default:
        ipv4_address: 172.19.0.103
    stdin_open: true 
    tty: true

  mongo-client:
    container_name: mongo-client1
    image: mongodb-client
    ports:
    - 5000:5000
    expose:
    - "5000"
    networks:
     default:
        ipv4_address: 172.19.0.50
    stdin_open: true 
    tty: true

  mongos1:
    container_name: mongos1
    image: mongodb-mongos
    command: mongos --configdb configReplicaSet/mongocfg1:27019,mongocfg2:27019 --bind_ip 0.0.0.0
    ports:
    - 3300:3300
    expose:
    - "3300"
    - "27017"
    networks:
     default:
        ipv4_address: 172.19.0.150
    stdin_open: true 
    tty: true

  # iperf3-server:
  #   container_name: iperf3-server
  #   image: networkstatic/iperf3
  #   expose:
  #   - "5201"
  #   command: iperf -s
  #   networks:
  #    default:
  #       ipv4_address: 172.19.0.200
  #   stdin_open: true 
  #   tty: true

  # iperf3-client:
  #   container_name: iperf3-client
  #   image: networkstatic/iperf3
  #   expose:
  #   - "5202"
  #   command: iperf -c 172.19.0.200
  #   networks:
  #    default:
  #       ipv4_address: 172.19.0.201
  #   stdin_open: true 
  #   tty: true

I'm not using official docker hub images but my own, they are very simple, basic installation, the difference is that containers with mongod for example have only mongod installed instead of whole MongoDB installation, similarly, mongos is only mongos installed. I think that this might be source of problem.

iperf3 containers were used for measuring speed of Docker bridged network, they resulted with over 50 Gbites/s of bandwidth, so it should not be the source of problems.

The further configuration of cluster is following:

  1. I connect from mongo-client1 (after using docker attach) to one of config servers and run

    rs.initiate(
    {
    _id: "configReplicaSet",
    configsvr: true,
    members: [
    { _id : 0, host : "mongocfg1:27019" },
    { _id : 1, host : "mongocfg2:27019" },
    { _id : 2, host : "mongocfg3:27019" }
    ]
    }
    )

  2. Similarly I connect to one of shardReplicaSet1 mongod instances and run:

    rs.initiate(
    {
    _id : "shardReplicaSet1",
    members: [
    { _id : 0, host : "mongors1-1:27017" },
    { _id : 1, host : "mongors1-2:27017" },
    { _id : 2, host : "mongors1-3:27017"}
    ]})

  3. In similar way I initate shardReplicaSet2.

  4. I use docker attach to mongos1 container and start mongos with: mongos –configdb configReplicaSet/mongocfg1:27019,mongocfg2:27019 –bind_ip 0.0.0.0

    1. I connect to mongos1 from mongo-client1 and run: sh.addShard( "shardReplicaSet1/mongors1-1:27017") and sh.addShard( "shardReplicaSet2/mongors2-1:27017"), I don't know if this is necessary but I use hostname of primary member of each replica set.

The problems

  1. Being connected to mongos1, I create database named "key_v" with collection named "performance", then I
    enable sharding on database (sh.enableSharding("key_v")),
    then on collection using shard key:
    sh.shardCollection("key_v.performance", {"address": 1})

    Then I use for loop to insert many documents to it:

for(i=0; i<=10000; i++)
{
db.collection.insert("address": i,"array1": [999, "foo"]);
}

Oh, and I set chunk size to 1 MB
previously. So, after loop finishes, the results of
db.collection.getShardDistribution() shows that all chunks are located
on only one of shards, after few minutes in fact chunks are
distributed evenly between shards. Here is my question: should not
mongos distribute chunks between shards during execution of loop,
instead of directing them all to one shard? If it can configured,
how should I achieve that? By this?
[https://docs.mongodb.com/manual/tutorial/manage-shard-zone/][1] . Maybe shard key is incorrect?

  1. Another related case: I try to import large MongoDB database in bson using mongorestore. I'm doing it from outside of Docker network:
    mongorestore –host 127.19.0.150:3300 -d import1 -c test /path/base.bson
    The import works well, but all the chunks are located in one of shards. Furthermore, migration of this chunks to second shard is very slow, the database weighs 1.3 Gbytes and after 40 minutes only 32% of chunks have been migrated. The database data schema consists of multiple fields, I have chosen one field with Int32 datatype as shard key, but it's cardinality is very low, 15% of documents have the same value for it, could this be a source of it? Estimated data per chunk is 529KiB (getShardDistribution() method results).

All containers run on computer with 32GB of RAM and i7-6700HQ, could the slow HDD be a bottleneck resulting in such a slow chunks migration?

Best Answer

So, after loop finishes, the results of db.collection.getShardDistribution() shows that all chunks are located on only one of shards, after few minutes in fact chunks are distributed evenly between shards. Here is my question: should not mongos distribute chunks between shards during execution of loop, instead of directing them all to one shard?

The issue is that you are using a monotonically increasing shard key. This will result in all inserts targeting the single shard (aka a "hot shard") that currently has the chunk range representing the highest shard key value. Data will eventually be rebalanced to other shards, but this is not an effective use of sharding for write scaling. You need to choose a more appropriate shard key. If your use case does not require range queries, you could also consider using a hashed shard key on the address field. See Hashed vs Ranged Sharding for an illustration of the expected outcomes.

Another related case: I try to import large MongoDB database in bson using mongorestore. I'm doing it from outside of Docker network: mongorestore --host 127.19.0.150:3300 -d import1 -c test /path/base.bson The import works well, but all the chunks are located in one of shards.

If the outcome is similar (all inserts going to a single shard and then being rebalanced), this also suggests a poor shard key choice.

If you are bulk inserting into an empty sharded collection there is an approach you can use to minimize rebalancing: pre-splitting chunk ranges based on the known distribution of shard key values in your existing data.

The database data schema consists of multiple fields, I have chosen one field with Int32 datatype as shard key, but it's cardinality is very low, 15% of documents have the same value for it, could this be a source of it?

Low cardinality shard keys will definitely cause issues with data distribution. The most granular chunk range possible will represent a single shard key value. If a large percentage of your documents share the same shard key values, those will eventually lead to indivisible jumbo chunks which will be ignored by the balancer.

All containers run on computer with 32GB of RAM and i7-6700HQ, could the slow HDD be a bottleneck resulting in such a slow chunks migration?

There isn't enough information to determine if your disk is the most limiting factor, but running a sharded cluster on a single computer with a slow HDD will certainly add resource contention challenges. Choosing appropriate shard keys should minimize the need for data migration unless you are adding or removing shards for your deployment.

Assuming you are using a recent version of MongoDB with the WiredTiger storage engine as default (MongoDB 3.2+), you will definitely want to explicitly set --wiredTigerCacheSizeGB to limit the internal cache size for mongod instances. See: To what size should I set the WiredTiger internal cache?.