What does horizontal scaling mean

couchdbnosqlscalabilityterminology

In database context, I have come across horizontal scalability as one of the advantages of the NOSQL databases. What does the term mean?

How would it compare to vertical scaling?

Best Answer

Horizontal Scaling
Horizontal Scaling is essentially building out instead of up. You don't go and buy a bigger beefier server and move all of your load onto it, instead you buy 1+ additional servers and distribute your load across them.

Horizontal scaling is used when you have the ability to run multiple instances on servers simultaneously. Typically it is much harder to go from 1 server to 2 servers then it is to go from 2 to 5, 10, 50, etc.

Once you've addressed the issues of running parallel instances, you can take great advantage of environments like Amazon EC2, Rackspace's Cloud Service, GoGrid, etc as you can bring instances up and down based on demand, reducing the need to pay for server power you aren't using just to cover those peak loads.

Relational Databases are one of the more difficult items to run full read/write in parallel.

I saw Damien Katz speaking about CouchDB at StackOverflow DevDays in Austin and one of his primary focuses for its creation was these parallel instances. As this has been a focus of it since day one, it would be much more capable of taking advantage of horizontal scaling.

Vertical Scaling
Vertical Scaling is the opposite, build up instead of out. Go and buy the strongest piece of hardware you can afford and put your application, database, etc on it.

Real World
Of course, both of these have their advantages and disadvantages. Often times a combination of these two are used for a final solution.

You may have your primary database where everyone writes to and reads real time data on a large piece of hardware. Then have distributed read only copies of the database for heavier data analysis and reporting where being up to the minute doesn't matter as much. Then the front end web application may be running on multiple web servers behind a load balancer.

Related Solutions

A Key/Value store database

Are you familiar with the concept of a Key/Value Pair? Presuming you're familiar with Java or C# this is in the language as a map/hash/datatable/KeyValuePair (the last is in the case of C#)

The way it works is demonstrated in this little sample chart:

Color        Red
Age          18
Size         Large
Name         Smith
Title        The Brown Dog

Where you have a key (left) and a value (right) ... notice it can be a string, int, or the like. Most KVP objects allow you to store any object on the right, because it's just a value.

Since you'll always have a unique key for a particular object that you want to return, you can just query the database for that unique key and get the results back from whichever node has the object (this is why it's good for distributed systems, since there's other things involved like polling for the first n nodes to return a value that match other nodes returns).

Now my example above is very simple, so here's a slightly better version of the KVP

user1923_color    Red
user1923_age      18
user3371_color    Blue
user4344_color    Brackish
user1923_height   6' 0"
user3371_age      34

So as you can see the simple key generation is to put "user" the userunique number, an underscore and the object. Again, this is a simple variation, but I think we begin to understand that so long as we can define the part on the left and have it be consistently formatted, that we can pull out the value.

Notice that there's no restriction on the key value (ok, there can be some limitations, such as text-only) or on the value property (there may be a size restriction) but so far I've not had really complex systems. Let's try and go a little further:

app_setting_width      450
user1923_color         Red
user1923_age           18
user3371_color         Blue
user4344_color         Brackish
user1923_height        6' 0"
user3371_age           34
error_msg_457          There is no file %1 here
error_message_1        There is no user with %1 name
1923_name              Jim
user1923_name          Jim Smith
user1923_lname         Smith
Application_Installed  true
log_errors             1
install_path           C:\Windows\System32\Restricted
ServerName             localhost
test                   test
test1                  test
test123                Brackish
devonly
wonderwoman
value                  key

You get the idea... all those would be stored in one massive "table" on the distributed nodes (there's math behind it all) and you would just ask the distributed system for the value you need by name.

At the very least, that's my understanding of how it all works. I may have a few things wrong, but that's the basics.

obligatory wikipedia link http://en.wikipedia.org/wiki/Associative_array

Are there any in-memory NoSQL databases

The issue with storing your database in memory is if you have any sort of memoery issue or server has to be restarted or anything of that issue all your memory will get flushed.

That is the reason people don't store their database in memory.

Now, there are caching tools which are in-memory and can work as a very simple database like memcached. That may meet your needs. If you look in to tmpfs and ramfs you can create a folder that exists in memory and move your files in there normally.

So, if you are working with MongoDB, mysql or whatever you work with, you can have the data folder live in the RAM folder. This will give the database super fast read and writes. Everything will be really fast. You will be limited to how much RAM you have minus the size of your OS and other things running.

Also, just be careful: MongoDB likes to store writes in memory until the disk has a chance to write, it so you may want to turn that feature off because it will be the same speed.

My recommendation is to work with memcached and then mix it with a normal database that lives on disk. The concept is done with PHP sessions on some systems.

http://mickeyben.com/2009/12/30/using-nginx-as-a-load-balancer.html

The basic way it works is, if your record is found in memcached, then it will not check the database. If it is not found, then do three(3) things:

check the db
send the data to memcached
send the data to calling function

Best Answer

Related Solutions

A Key/Value store database

Are there any in-memory NoSQL databases

Related Question