In my opinion, the key differentiator of true SOA systems (over the pseudo SOA, ntier/distributed systems that are becoming ubiquitous) is that there should be zero interaction between discrete services. Where this is achieved, any application you compose from these services can and should be built to tolerate the failure of any consistuent part. A failure reduces functionality but service is maintained.
In this scenario it's logical, or required, to separate the underlying database for each service. If however you have services which are interdependent, there is little (perhaps nothing) to be gained from a split.
I'd recommend reading sites such as HighScalability.com which dig into the architectures adopted by the never-fail type websites. One of my favourites of late was the story of the Netflix Chaos Monkey which was mentioned on Coding Horror.
Addressing a couple of the points in your question:
In the event of a disaster, restoring the platform to a consistent
state is easier.
This is true but you should perhaps be thinking about how to better decouple these services so this stops being an issue. Alternatively, there are methods to ensure synchronisation across multiple databases, transaction marks in SQL Server for example.
For data that is referenced by multiple services, data cached by one
service is likely to be used soon after by another service.
Distributed cache solutions (memcached et al) could help here but you'd be violating the service independence principles. This would be comparable to having two services communicating with each other directly, or worse having a service access anothers data store, bypassing the service interface altogether. Inevitably data will be related and will be handed between services by the calling platform, the tricky decisions tend to be around which service will own which pieces of data. StackOverflow or Programmers sites might be better placed to help with the more general SOA issues.
Assuming each database is on separate hardware, scaling up yields more
performance benefits.
Certainly it can be cheaper to scale out across multiple lower spec machines than to scale up a single machine. Although, the lower hardware costs may be dwarfed in the total cost of ownership when the soft costs of additional development effort and operational complexity are factored in.
If this isn't SOA and you just have a case where the component services of this platform are being built by different teams/suppliers for logistical reasons, stick with a single database and completely ignore everything above! :)
You seem to use the word "database" when you mean "table." One database for all data is best. Most modern databases (PostgreSQL, MySQL, Oracle, DB2, SQL Server, etc.) can handle a large amount of data in one database. It sounds like you might want to access all customer and/or all invoice information together, so keep them together in one database.
Data should be split into separate tables to normalize it as needed. Fully normalized data (each piece of information stored only once) is recommended by every database textbook I have read. Read about (or at least Google) "database normalization" to understand that concept well.
One normalized approach would be to have one customer table indexed by a customer id with all fields common to every customer. The customer table would also have a field identifying the customer type: customer1, 2, ... 10. A separate table would hold customer-1-type fields, indexed by customer id. Another table would have customer-2-type fields, etc. The application using the database would be designed to look for data in the customer-1 table if the customer-type is customer1, look for data in customer-2 table if customer-type is customer2, etc.
Presumably, the relationship between customer and invoice is one to many. A invoice table could have a foreign key linking to the customer table. If invoice data varied by customer-type, then that data could be placed in separate tables: invoice-customer-type-1, invoice-customer-2, etc. When gathering invoice data, find the customer id for the invoice from the invoice table, look up the customer-type from the customer table, then get data from the appropriate invoice-customer-# table.
Also, the tables in the database should be structured how the data will be used. For data entry and updating, a normalized structure is good. For printing bills or other reporting, you might want to create temporary de-normalized tables to speed the process. Creation of a temporary table can be faster because you avoid making multiple table joins every time a bill is printed.
Get a textbook and study database design, but in the meantime, I hope this helps.
Best Answer
for security, you want your data duplicated on many servers as possible, so if one falls, the others can share the load.
"classic" oltp databases performance issues raise from concurrent transactions. Due mainly to locks, time/transaction grows exponentially. Hence, a simple way to mitigate this issue is to have multiple databases, as there will be less concurrent transactions by database. Thus, this is an attractive duct tape (cheap and easy, at least compared to alternatives).
One thing to consider is that historically the bottleneck has been storage (disks), so databases tend to minimize disk access by consuming lot of cpu and memory. As well, partitioning tables helps reduce the load on disks.
As well, multiple databases allow to cache most of it in memory, seriously improving performance too.
There are other options. Some databases can use multiple machines, or you can pool machines in a virtual one, but that only goes so far too.
The current trend is to split oltp and olap database. This make sense, oltp require fast row "access", olap require high throughput column reads.
Another trick is to have two databases, one in read/write, and one in read only mode. They are in a master/slave configuration. Most reads are done one the slave, and changes on the masters are pushed to the slave. This alleviate the burden on the master database, which deal with transactions.
If you observe your database carefully, you will notice that very few operations really need acid transactions. That when it might be interesting to leave the classical model behind, and build two or three systems. By example one for tickets (which needs transactions as you don't want to sell out of stock products), and another system for the rest. If you update a product price and the old price is used for 0.1 second, it does not mater much (except for an auction system of course). Point is, with a traditional system, you can't sell tickets while you update the price, as the row will be locked, and it will be unavailable for much longer (1-10 second), as your traditional db (which is now massive) is much slower.
Typically the transactions db will be "in memory" (based on kernel locks, or multi-paxos), and the read one in a "nosql" db for 99% of the rest. The in memory logs are then merged to the nosql database. Combined with other tricks it scales linearly, and hence can grow in ways traditional databases can difficultly achieve.
This is more complicated, require analysis, learning new paradigms, and massive time to implement. To my knowledge, it is used mainly by "giants" like google or twitter mainly. This is the basis of the "big data" movements.
If your company uses a traditional sql database, you better off using all the tricks available (sharding, partitionning, read/write dbs, vm) before changing your full echosystem.