The only time that moving the journal is an absolute recommendation is if you have to use a direct NFS mount - NFS is not recommended for MongoDB in general but in particular it does not play well with the journal.
In general, the journal will have quite a different access pattern to the rest of your data (sequential versus random access). Hence it is often a good idea to separate the journal and the data from a performance perspective. Note that this is a very broad generalization and will vary depending on your usage of MongoDB, but will generally be true.
Since your questions are largely about EC2 and EBS then NFS will not play a part here, but I thought it worth mentioning in a broader context, onto specifics.
For EC2/EBS, you will generally only need to separate the journal out if you are seeing write IO contention (or overall IO contention given the nature of EBS) - it will move IO to a different disk and free up some capacity on the data disk. Of course, with EBS your IO is also dependent on the network IO available on the instances and that is dependent on your instance size (and whether you have opted for P-IOPS), hence a lot of variables.
If you are seeing high IOWait times and your disk looks write bound (see IOStat), this is something you should consider, but only as a temporary measure, because it will only be a small tweak compared to increasing available IO. There are plenty of options available there depending on your starting point, like adding more EBS drives to the RAID or by taking advantage of P-IOPS provisioning. You can also occasionally get better performance by changing instance type so that you have less network contention. Each of these may be more effective than moving the journal and have less headaches.
To explain, there are other considerations here - snapshots for one. Thanks to the journal you no longer have to fsync and lock the database to get a consistent snapshot (be it EBS or LVM or other). However the journal has to be included in the snapshot for that to be the case. Hence whatever node you use for backing up, if you intend to snapshot without taking downtime for that node, then you need to make sure the journal is included.
Finally, one of the uses of the journal is to facilitate recovery from an unclean shutdown, such as an OS level crash/reboot. If the journal is on the ephemeral disk in EC2, it will be blown away by such a reboot and hence not be useful in that context. Any such crash/reboot with such a configuration would therefore require a resync from scratch or a restore from backup/snapshot should it occur.
Overall, like most configuration decisions, you have to weigh the pros and cons and pick the solution most relevant for your use case. Hopefully this will give you enough information to make an informed choice.
Best Answer
To answer the question, generally the best option is the S.A.M.E. for all databases!
Stripe And Mirror Everything - also known as RAID 10 (or RAID 1+0).
As you can see from the oracle-base link (an absolutely super site for all things Oracle BTW), you can see that this is the preferred general RAID level for Oracle - with the proviso that one can delve a bit deeper and use other RAID levels for different file types (data, logs, control files...). See the table in the link below,
but this is the take-home message (IMHO) - in the
RAID Levels
section:So, 1+0 is definitely the preferred option.
Microsoft's page RAID Levels and SQL Server also says:
Note that last bit
but at the expense of using two times as many disks
- one gets nothing for nothing! "Yae cannae beet the laws o' physics, Jim..." (with apologies to Gene Roddenberry).For completeness, Severalnines (a top PostgreSQL consulting company) says:
and for MySQL, consider Percona's advice (Percona are a very highly regarded MySQL consulting group with their own fork of the server):
So, again generally, RAID 1+0 is considered the optimal solution. However, if you read around this topic, you will see that there are issues such as expense - no organisation has infinite resources, so decisions are sometimes made on budgetary grounds not to go with RAID 1+0 but rather RAID 5.
This, IMHO, is a huge mistake - it's a false economy. You might be saving some money on disks, but your employees' and customers' time (and sanity) are valuable resources in their own right (painful past experience, but I've had therapy and I'm OK now...).
So, ideally, it is better to spread data across disks. To answer the part of your question about if one is running more than one database on the same server, then questions of multi-tenancy arise.
For example, you have to ask yourself "If I take down the system, that's two clients gone or would I rather be able to take them down separately if required?". That's a question which only the stakeholders in your own organisation can answer - Management, Customers, DBA's...
Personlly, I would keep separate clients' data as separate as possible, but again you have budget considerations - a separate disk subsystem for each client, software licencing issues and the like. My advice is to read deeply around the area in order to be able to provide a sound strategy for your own organisation given the resources to hand.
p.s. +1 for an interesting question and welcome to the forum!