User terms surrogate key and natural key for primary key as a variant.
Surrogate Key:
Surrogate keys are keys that have no
“business” meaning and are solely used
to identify a record in the table.
Such keys are either database
generated (example: Identity in SQL
Server, Sequence in Oracle,
Sequence/Identity in DB2 UDB etc.) or
system generated values (like
generated via a table in the schema).
Natural Key:
Keys are natural if the attribute it
represents is used for identification
independently of the database schema.
What this basically means is that the
keys are natural if people use them
example: Invoice-Numbers, Tax-Ids, SSN
etc.
Surrogate Keys vs Natural Keys for Primary Key
Horizontal Scaling
Horizontal Scaling is essentially building out instead of up. You don't go and buy a bigger beefier server and move all of your load onto it, instead you buy 1+ additional servers and distribute your load across them.
Horizontal scaling is used when you have the ability to run multiple instances on servers simultaneously. Typically it is much harder to go from 1 server to 2 servers then it is to go from 2 to 5, 10, 50, etc.
Once you've addressed the issues of running parallel instances, you can take great advantage of environments like Amazon EC2, Rackspace's Cloud Service, GoGrid, etc as you can bring instances up and down based on demand, reducing the need to pay for server power you aren't using just to cover those peak loads.
Relational Databases are one of the more difficult items to run full read/write in parallel.
I saw Damien Katz speaking about CouchDB at StackOverflow DevDays in Austin and one of his primary focuses for its creation was these parallel instances. As this has been a focus of it since day one, it would be much more capable of taking advantage of horizontal scaling.
Vertical Scaling
Vertical Scaling is the opposite, build up instead of out. Go and buy the strongest piece of hardware you can afford and put your application, database, etc on it.
Real World
Of course, both of these have their advantages and disadvantages. Often times a combination of these two are used for a final solution.
You may have your primary database where everyone writes to and reads real time data on a large piece of hardware. Then have distributed read only copies of the database for heavier data analysis and reporting where being up to the minute doesn't matter as much. Then the front end web application may be running on multiple web servers behind a load balancer.
Best Answer
Etymology
"Collation" is probably best defined on etymology.com,
It hasn't changed at all in the past 600 years. "collate" means "to bring together" and whether it's the ordering of a book, chronologically or otherwise in a database, it's all the same.
Computer Science
As it applies to Computer Science, what we're doing is separating the storage mechanism from the ordering. You may have heard of ASCII-betical. That refers to a comparison based on the binary encoding of ASCII characters. In that system, storage and collation are two in the same. If the ASCII-standardized "encoding" ever changed the "collation" (order) would change too.
POSIX started to break that with
LC_COLLATE
. But as we move into Unicode a consortium emerged to standardize collations as well: ICU.SQL
In the SQL spec there are two extensions to provide collations,
F690
: “Collation support:Includes
collation name
,collation clause
,collation definition
anddrop collation
.F692
: Extended collation support,Includes attaching a different default collation to a column in the schema.
Essentially these provide the ability to
CREATE
andDROP
collations, to specify them for operators and sorts, and to define a default for columns.For more information on what SQL has to offer check out,
Examples
Rather than pasting a limited example, here is the PostgreSQL test suite it's pretty extensive. Check out at least the first link and look for
'Türkiye' COLLATE "tr-x-icu" ILIKE '%KI%' AS "false"
collate.icu.utf8.out
collate.linux.utf8.out
collate.out
Probably not worth checking out, but I placed it here for completeness, boring POSIX checks.