Are you familiar with the concept of a Key/Value Pair? Presuming you're familiar with Java or C# this is in the language as a map/hash/datatable/KeyValuePair (the last is in the case of C#)
The way it works is demonstrated in this little sample chart:
Color Red
Age 18
Size Large
Name Smith
Title The Brown Dog
Where you have a key (left) and a value (right) ... notice it can be a string, int, or the like. Most KVP objects allow you to store any object on the right, because it's just a value.
Since you'll always have a unique key for a particular object that you want to return, you can just query the database for that unique key and get the results back from whichever node has the object (this is why it's good for distributed systems, since there's other things involved like polling for the first n nodes to return a value that match other nodes returns).
Now my example above is very simple, so here's a slightly better version of the KVP
user1923_color Red
user1923_age 18
user3371_color Blue
user4344_color Brackish
user1923_height 6' 0"
user3371_age 34
So as you can see the simple key generation is to put "user" the userunique number, an underscore and the object. Again, this is a simple variation, but I think we begin to understand that so long as we can define the part on the left and have it be consistently formatted, that we can pull out the value.
Notice that there's no restriction on the key value (ok, there can be some limitations, such as text-only) or on the value property (there may be a size restriction) but so far I've not had really complex systems. Let's try and go a little further:
app_setting_width 450
user1923_color Red
user1923_age 18
user3371_color Blue
user4344_color Brackish
user1923_height 6' 0"
user3371_age 34
error_msg_457 There is no file %1 here
error_message_1 There is no user with %1 name
1923_name Jim
user1923_name Jim Smith
user1923_lname Smith
Application_Installed true
log_errors 1
install_path C:\Windows\System32\Restricted
ServerName localhost
test test
test1 test
test123 Brackish
devonly
wonderwoman
value key
You get the idea... all those would be stored in one massive "table" on the distributed nodes (there's math behind it all) and you would just ask the distributed system for the value you need by name.
At the very least, that's my understanding of how it all works. I may have a few things wrong, but that's the basics.
obligatory wikipedia link http://en.wikipedia.org/wiki/Associative_array
The issue with storing your database in memory is if you have any sort of memoery issue or server has to be restarted or anything of that issue all your memory will get flushed.
That is the reason people don't store their database in memory.
Now, there are caching tools which are in-memory and can work as a very simple database like memcached. That may meet your needs. If you look in to tmpfs and ramfs you can create a folder that exists in memory and move your files in there normally.
So, if you are working with MongoDB, mysql or whatever you work with, you can have the data folder live in the RAM folder. This will give the database super fast read and writes. Everything will be really fast. You will be limited to how much RAM you have minus the size of your OS and other things running.
Also, just be careful: MongoDB likes to store writes in memory until the disk has a chance to write, it so you may want to turn that feature off because it will be the same speed.
My recommendation is to work with memcached and then mix it with a normal database that lives on disk. The concept is done with PHP sessions on some systems.
http://mickeyben.com/2009/12/30/using-nginx-as-a-load-balancer.html
The basic way it works is, if your record is found in memcached, then it will not check the database. If it is not found, then do three(3) things:
- check the db
- send the data to memcached
- send the data to calling function
:)
Best Answer
Horizontal Scaling
Horizontal Scaling is essentially building out instead of up. You don't go and buy a bigger beefier server and move all of your load onto it, instead you buy 1+ additional servers and distribute your load across them.
Horizontal scaling is used when you have the ability to run multiple instances on servers simultaneously. Typically it is much harder to go from 1 server to 2 servers then it is to go from 2 to 5, 10, 50, etc.
Once you've addressed the issues of running parallel instances, you can take great advantage of environments like Amazon EC2, Rackspace's Cloud Service, GoGrid, etc as you can bring instances up and down based on demand, reducing the need to pay for server power you aren't using just to cover those peak loads.
Relational Databases are one of the more difficult items to run full read/write in parallel.
I saw Damien Katz speaking about CouchDB at StackOverflow DevDays in Austin and one of his primary focuses for its creation was these parallel instances. As this has been a focus of it since day one, it would be much more capable of taking advantage of horizontal scaling.
Vertical Scaling
Vertical Scaling is the opposite, build up instead of out. Go and buy the strongest piece of hardware you can afford and put your application, database, etc on it.
Real World
Of course, both of these have their advantages and disadvantages. Often times a combination of these two are used for a final solution.
You may have your primary database where everyone writes to and reads real time data on a large piece of hardware. Then have distributed read only copies of the database for heavier data analysis and reporting where being up to the minute doesn't matter as much. Then the front end web application may be running on multiple web servers behind a load balancer.