Are there any benefits in disabling mongodb journaling besides performance gain?
The only other marginal benefit is if you are using a 32-bit build (which is also not recommended for production use). Since 32-bit builds are limited to ~2Gb of addressable data for memory mapped files, the journal is off by default to allow for more data (with journal enabled, the 32-bit data limit is halved to ~1Gb). In all other cases, the journal is enabled by default. For 64-bit systems, the addressable virtual memory is not a limiting factor for data size.
Any performance gain from disabling the journal is a tradeoff for data & operational risk. I would strongly advise against disabling the journal unless your data is ephemeral (i.e. you don't care about losing it or can easily recreate it from another source of truth).
If you are concerned about possible contention or overhead from disk I/O of journaling, you could always move the journal to a separate volume. NOTE: moving the journal to a separate volume may also change your backup strategy, particularly if you were relying on filesystem snapshots to take a comparatively quick backup without having to fSyncLock()
and quiesce your database.
I know journaling provides write durability and crash resiliency. But I can tune syncdelay parameter to make data files flush more frequently, but mongodb docs recommends against this.
Writes to the journal are fast append-only operations. As you've noted, these provide durability and crash resiliency. If MongoDB stops unexpectedly, it can recover from the journal and data will be in a consistent state.
Without the journal, your data is in an unknown state after an unexpected shutdown and you need to validate it by running repair or (in the case of a replica set node) resync from a known-good copy of the data.
What effect would it have on my mongodb setup and why should I not set syncdelay on production setups?
The syncdelay
setting controls the frequency where the in-memory view is synced with the on-disk view of data (aka the "background flush interval").
By default, the journalCommitInterval
is 100ms and the syncdelay
(background flush) happens every 60 seconds.
If you adjust the syncdelay
too low, you can end up creating a lot of additional I/O (and reduced performance) because writes to dirty pages can no longer be effectively batched together and the same page will be re-written multiple times. If you adjust the syncdelay
higher, you can create I/O spikes (and reduced performance) with a large volume of changes being committed in a single batch.
The syncdelay
and journalCommitInterval
parameters are available as tuneable settings because in some cases it may make sense to adjust these (for example, to try to temporarily help an under-provisioned I/O system by reducing write spikes). In a healthy production system, it would be best to leave these settings as-is.
For more information, How MongoDB’s Journaling Works is a helpful read.
It's sensible to think about how to scale your application in future, but you can start with a simpler deployment that can grow with your requirements.
what should i choose? having 1 database sounds good as it will save me a lot of resources and will make my life easier, but separating the services to many database sound logic to me as one service don't need to see collections or be able to access a collection that have nothing to do with his job.
My recommendation would be to set up a single MongoDB deployment (ideally starting with a replica set or hosted service with high availability & data redundancy):
A single deployment of MongoDB can have multiple databases; a MongoDB deployment per service adds unnecessary admin complexity.
For your use case it sounds reasonable to have a database per service and setup a user per service so actions are limited to direct interaction with the expected database.
A database per service also limits potential impact of administrative actions (eg. a foreground index build) that may create database-level contention.
Later you can scale/tune the deployment starting with options like directoryPerDB
(eg. faster storage for some databases) or partitioning your data using sharding.
I'm not sure how mongoDB handle many connections to the same instance, or to same database, as you can see it might get a lot of calls in a second even if i have pretty low amount of users.
This is a more nuanced question as it depends on server resources as well as what those connections are doing and how/if your driver implements connection pooling. Ideally I would load test your actual application and determine how resource usage scales with increased user activity. If you are administering your own MongoDB deployment you should review the production notes and in particular make sure you have your ulimit settings increased appropriately to ensure you don't run out of file descriptors.
If you're concerned about ease of scaling you could also consider a hosted database-as-a-service solution (for example, MongoDB Atlas).
I read about mongo Sharding, from what i understood it happens per database and not on the instance level, that's why i'm thinking about the 1 database option, as it quite useful in my case, am i right?
Sharding is initially enabled at the database level but the partitioning happens at the collection level based on a shard key. You can have a mix of sharded/unsharded databases and sharded/unsharded collections in the same MongoDB deployment.
Best Answer
With "quiet" configuration parameter (not recommended) you get rid of:
but "end connection ..." still stays there and there is no way to remove it, without changing source code. You can always ask developers to change code to