Designing a platform: one database or multiple databases

database-designdeployment

We are building a web platform that incorporates multiple services, each with its own underlying data. These services are being built independently following the principles of Service-Oriented Architecture, but they transact against potentially related data. We are considering whether these services should share one big database or each have their own database. (We are planning to use SQL Server 2008 Enterprise on a Windows 2008 cluster.)

Some of the advantages to each approach we have already considered include:

Single Database

  • Relating data from different services can be bound together by foreign key constraints
  • Analytic extracts are simpler to write and faster to execute
  • In the event of a disaster, restoring the platform to a consistent state is easier
  • For data that is referenced by multiple services, data cached by one service is likely to be used soon after by another service
  • Administration and monitoring is simpler and cheaper up front

Multiple Databases

  • Maintenance work, hardware problems, security breaches and so forth do not necessarily impact the whole platform
  • Assuming each database is on separate hardware, scaling up multiple machines yields more performance benefits than scaling up one big one

From an operational perspective, is it more advantageous that each service in this platform get its own database, or that they all go in the same database? What key factors inform an answer to this question?

Best Answer

In my opinion, the key differentiator of true SOA systems (over the pseudo SOA, ntier/distributed systems that are becoming ubiquitous) is that there should be zero interaction between discrete services. Where this is achieved, any application you compose from these services can and should be built to tolerate the failure of any consistuent part. A failure reduces functionality but service is maintained.

In this scenario it's logical, or required, to separate the underlying database for each service. If however you have services which are interdependent, there is little (perhaps nothing) to be gained from a split.

I'd recommend reading sites such as HighScalability.com which dig into the architectures adopted by the never-fail type websites. One of my favourites of late was the story of the Netflix Chaos Monkey which was mentioned on Coding Horror.

Addressing a couple of the points in your question:

In the event of a disaster, restoring the platform to a consistent state is easier.

This is true but you should perhaps be thinking about how to better decouple these services so this stops being an issue. Alternatively, there are methods to ensure synchronisation across multiple databases, transaction marks in SQL Server for example.

For data that is referenced by multiple services, data cached by one service is likely to be used soon after by another service.

Distributed cache solutions (memcached et al) could help here but you'd be violating the service independence principles. This would be comparable to having two services communicating with each other directly, or worse having a service access anothers data store, bypassing the service interface altogether. Inevitably data will be related and will be handed between services by the calling platform, the tricky decisions tend to be around which service will own which pieces of data. StackOverflow or Programmers sites might be better placed to help with the more general SOA issues.

Assuming each database is on separate hardware, scaling up yields more performance benefits.

Certainly it can be cheaper to scale out across multiple lower spec machines than to scale up a single machine. Although, the lower hardware costs may be dwarfed in the total cost of ownership when the soft costs of additional development effort and operational complexity are factored in.

If this isn't SOA and you just have a case where the component services of this platform are being built by different teams/suppliers for logistical reasons, stick with a single database and completely ignore everything above! :)