In my opinion, the key differentiator of true SOA systems (over the pseudo SOA, ntier/distributed systems that are becoming ubiquitous) is that there should be zero interaction between discrete services. Where this is achieved, any application you compose from these services can and should be built to tolerate the failure of any consistuent part. A failure reduces functionality but service is maintained.
In this scenario it's logical, or required, to separate the underlying database for each service. If however you have services which are interdependent, there is little (perhaps nothing) to be gained from a split.
I'd recommend reading sites such as HighScalability.com which dig into the architectures adopted by the never-fail type websites. One of my favourites of late was the story of the Netflix Chaos Monkey which was mentioned on Coding Horror.
Addressing a couple of the points in your question:
In the event of a disaster, restoring the platform to a consistent
state is easier.
This is true but you should perhaps be thinking about how to better decouple these services so this stops being an issue. Alternatively, there are methods to ensure synchronisation across multiple databases, transaction marks in SQL Server for example.
For data that is referenced by multiple services, data cached by one
service is likely to be used soon after by another service.
Distributed cache solutions (memcached et al) could help here but you'd be violating the service independence principles. This would be comparable to having two services communicating with each other directly, or worse having a service access anothers data store, bypassing the service interface altogether. Inevitably data will be related and will be handed between services by the calling platform, the tricky decisions tend to be around which service will own which pieces of data. StackOverflow or Programmers sites might be better placed to help with the more general SOA issues.
Assuming each database is on separate hardware, scaling up yields more
performance benefits.
Certainly it can be cheaper to scale out across multiple lower spec machines than to scale up a single machine. Although, the lower hardware costs may be dwarfed in the total cost of ownership when the soft costs of additional development effort and operational complexity are factored in.
If this isn't SOA and you just have a case where the component services of this platform are being built by different teams/suppliers for logistical reasons, stick with a single database and completely ignore everything above! :)
You are getting at something that is missing from standard normalization discussions, namely the constraint dependencies. In general wider tables provide greater possibilities here than narrower tables. So my view is that the sorts of questions you are asking in fact highlight good reasons to denormalize. I would go with your first solution (the one you are leaning towards right now).
In my view good database design generally normalizes as far as possible but ensures that all columns necessary for proper data constraints are included. You can do some of this with composite foreign keys if you don't mind adding additional unique constraints on the parent tables. Leveraging data constraints an important part of database design and one should not sacrifice that for the sake of normalization that looks good in theory.
Best Answer
Well, it really depends on schema, volume, etc. What exactly is a shop storing? How is it different from storing data about 50,000 cats or 50,000 products or 50,000 wingnuts?
There are several reasons (other than just the size aspect on its own) why you might not want to store data for 50,000 different customers in a single database, if indeed the data can be completely segregated by customer (not including lookup tables like zipcodes or application-specific tables, which could go into a single, central database):
if one customer outgrows the application, there is no easy way to extract just their data and move it to another instance, server, etc. to scale out, unless you plan ahead and partition on something like
CustomerID
and have 50,000 filegroups (you're limited to 15,000 partitions anyway, or 1,000 if you're on an older version of SQL Server, and having too many filegroups can be disastrous). Also note that partitioning requires Enterprise Edition.if it turns out that all of your customers are simply too big for this instance, scaling out means getting new hardware and moving the entire database there (and potentially doing that again down the road).
deleting a customer can be equally painful, as you will have to delete some % of rows from very big tables, and that will not be cheap.
you will likely have wide distribution of customer data (one customer with a billion rows, another customer with 5,000). This can lead to things like parameter sniffing and detrimental performance involving cardinality and plan quality (since you will likely be re-using the same plans for the same queries against very different data sets).
all of your customers are subject to the exact same SLAs and HA/DR plans. You either have the entire database in full recovery mode with n-minute log backups, or you are in simple and rely on full+diff backups. If you have to revert because of a customer error, or need to recover the database to a point in time, that affects every single customer.
there is potential for errors in data retrieval - bugs in where clauses, for example, could lead to one customer seeing another customer's data, or all of the other customers' data.
there may be legal implications (some companies will have strict requirements in place that you do not place their data in the same database as any other company, and particularly that of their competitors).
if security of any one customer's data is important, then achieving that is much easier using database separation than separation within a table.
Some advantages to having each customer in a separate database (or at least having multiple databases, each for a group of customers):
DROP DATABASE
.Some drawbacks: