Mongodb – Mongoimport ‘error connecting to db servers’ for cluster

mongodb

I'm unable to use mongoimport to load some documents to a cluster hosted by MongoDB. I followed this tutorial Load File with mongoimport and then used this command in Powershell to try and load my documents:

.\mongoimport.exe --host "mongodb://clusterads-shard-00-00-qlf9n.mongodb.net:27017,clusterads-shard-00-01-qlf9n.mongodb.net:27017,clusterads-shard-00-02-qlf9n.mongodb.net:27017/admin?replicaSet=ClusterADS-shard-0" --ssl -u <USER> -p '<PASSWORD>' --authenticationDatabase <AUTHENT> --db adscoursework --collection volcanos --drop --jsonArray --file C:\data\volErups.json

This gives the error:

[........................] adscoursework.volcanos       0B/844KB (0.0%)
[........................] adscoursework.volcanos       0B/844KB (0.0%)
Failed: error connecting to db server: no reachable servers

I can connect to the above host using Mongo Shell so I don't understand where the issue is.

Best Answer

Replacing mongodb:// at the start with ClusterADS-shard-0/ (which is the replica set) and removing /admin?... made it work.

Related Solutions

MongoDB cluster setup

Are you saying that when you have logged in at Node1, you can connect that mongos service with --host node1 but not --host localhost? If so, check your /etc/hosts for address localhost

But if you mean that you are at your application node and try to connect to localhost.. Of course it will not work.
Yes, it's OK.
Yes, admin users at config server replica set primary and shard replica sets primary is created for situation when you need to make changes directly on those replica sets.
Yes, that's correct. You cannot create replica set at mongos.
First, you don't need mongos process at every node. One mongos is enough. You could put that mongos to your application node too. BUT.. If you have connection problems outside of your nodes, check firewalls. I mean, even you can SSH to your nodes (what are in the cloud?!) it don't mean that those ports 1090x are open from outside world. If your nodes are in the cloud, firewall is not (only) in those nodes, your provider have own firewalls too.
You MUST connect to your mongos (10902) instance when you are creating new databases, users or adding new replica sets (shards) to cluster.
Take care that every mongod node can connect to every config-server and vice versa. Every mongos must be able to connect every replica set node at your cluster and of course to every config-server replica set node. How ever, none of mongod processes (shard or config) don't need to connect your mongos...

Check your log files (mongos, config-server, shard) for errors..

grep -iP 'error|fail' /var/log/mongodb/*.log

MongoDB Sharded Cluster Chunks Distribution – Docker and MongoDB 3.6

So, after loop finishes, the results of db.collection.getShardDistribution() shows that all chunks are located on only one of shards, after few minutes in fact chunks are distributed evenly between shards. Here is my question: should not mongos distribute chunks between shards during execution of loop, instead of directing them all to one shard?

The issue is that you are using a monotonically increasing shard key. This will result in all inserts targeting the single shard (aka a "hot shard") that currently has the chunk range representing the highest shard key value. Data will eventually be rebalanced to other shards, but this is not an effective use of sharding for write scaling. You need to choose a more appropriate shard key. If your use case does not require range queries, you could also consider using a hashed shard key on the address field. See Hashed vs Ranged Sharding for an illustration of the expected outcomes.

Another related case: I try to import large MongoDB database in bson using mongorestore. I'm doing it from outside of Docker network: mongorestore --host 127.19.0.150:3300 -d import1 -c test /path/base.bson The import works well, but all the chunks are located in one of shards.

If the outcome is similar (all inserts going to a single shard and then being rebalanced), this also suggests a poor shard key choice.

If you are bulk inserting into an empty sharded collection there is an approach you can use to minimize rebalancing: pre-splitting chunk ranges based on the known distribution of shard key values in your existing data.

The database data schema consists of multiple fields, I have chosen one field with Int32 datatype as shard key, but it's cardinality is very low, 15% of documents have the same value for it, could this be a source of it?

Low cardinality shard keys will definitely cause issues with data distribution. The most granular chunk range possible will represent a single shard key value. If a large percentage of your documents share the same shard key values, those will eventually lead to indivisible jumbo chunks which will be ignored by the balancer.

All containers run on computer with 32GB of RAM and i7-6700HQ, could the slow HDD be a bottleneck resulting in such a slow chunks migration?

There isn't enough information to determine if your disk is the most limiting factor, but running a sharded cluster on a single computer with a slow HDD will certainly add resource contention challenges. Choosing appropriate shard keys should minimize the need for data migration unless you are adding or removing shards for your deployment.

Assuming you are using a recent version of MongoDB with the WiredTiger storage engine as default (MongoDB 3.2+), you will definitely want to explicitly set --wiredTigerCacheSizeGB to limit the internal cache size for mongod instances. See: To what size should I set the WiredTiger internal cache?.

Best Answer

Related Solutions

MongoDB cluster setup

MongoDB Sharded Cluster Chunks Distribution – Docker and MongoDB 3.6

Related Question