Mysql – Two types of data, so two type of databases

database-designdatabase-recommendationMySQLnosqlrdbms

For a social network site, I need to propose a DB. The application is written in Java & will be hosted on VPS(s) initially.

Broadly classified there is two type of data to be stored at backend:

 1. dynamic lists which are:

     - frequently appended to   
     - frequently read       
     - sometimes reduced    

 2. fixed set of data keyed by a primary key(sometimes modified).

 "For serving any page, I need to have access to both kind of data!"

As demanded by every other SN site, we need to consider for easy scaling in the future, but in addition to that our team & resources are also very very limited. We would like to start with a 1 or 2 medium sized VPS(s) & add more servers as data & load grows.

Personally I usually prefer something that is used by a large community, so ofcourse MySQL is big option but it doesn't fit our entire needs. It could be used for 2nd kind of data(among the list above) ie for storing fixed set of columns/data but not ideal for storing dynamic lists(ie 1st kind).
So should I use a 2nd database just to fit in only that type of data (two database each containing only data best suited for them)? (Some suggested Cassandra to store the 2nd kind of data.)
What is the way to go ?

Best Answer

If the two families of data are related then I would generally recommend keeping the in the same database - it makes maintaining integrity easier, for instance taking a reliable and consistent backup is (depending on database engine of course) a single operation.

Of course if the two families of data are very loosely coupled anyway this need not be a concern, and using different databases or even completely different technologies for each becomes more likely to be a good solution particularly as you scale out (your first hardware scaling being too put each family of data on its own (virtual) machine. Remember though that if you do split the data over more than one storage technology you are increasing the amount of knowledge needed to maintain the system.

In MSSQL or similar you could break data with significantly different IO patterns into separate file groups in the same database and store them on different drives if needed for performance - this way all the data can be dealt with atomically for transaction or backup purposes and so forth while still being segregated for performance reasons.