Going backwards just feels wrong to me.
With only two data centers you could also implement identity ranges. Unless you cycle through identity values at an alarming rate, there is no reason you can't have:
-- Data center 1
CREATE TABLE dbo.Table
(
ID INT IDENTITY(1,1) PRIMARY KEY
-- , ...
);
-- Data center 2
CREATE TABLE dbo.Table
(
ID INT IDENTITY(1000000000,1) PRIMARY KEY
-- , ...
);
This would allow to generate 1 billion (well, 999,999,999) values in data center 1 before there is any danger in collision with data center 2. And you could add a CHECK constraint in data center 1 to prevent overlapping values, depending on how you prioritize errors vs. duplicates. You could also implement a recurring job that periodically checks how close you are to the lower bound of the other data center, if you're concerned that you'll really generate over a billion values in either data center in the app's lifetime (never mind yours).
If ~1 billion is not enough, there are two other alternatives to give more room to breathe:
-- Data center 1
ID INT IDENTITY(-2147483648,1)
-- Data center 2
ID INT IDENTITY(1,1)
This would give each data center over 2 billion non-overlapping values, with no danger of collision until data center 1 approached 0. And if that's still not enough:
-- Data center 1
ID BIGINT IDENTITY(-9223372036854775808,1)
-- Data center 2
ID BIGINT IDENTITY(1,1)
Or if you want to keep all values > 0, you could split the positive range roughly in half (you can be a little more precise than this if you're more pedantic than me).
-- Data center 1
ID BIGINT IDENTITY(1,1)
-- Data center 2
ID BIGINT IDENTITY(4611600000000000000,1)
I don't even know how to say that number, but it's ginormous. And in reality you'd have to work extremely hard on a very fast computer to use up all those values by the time your great great great great great grandchild graduates from college. And if you are on Enterprise Edition you can use data compression so that you don't pay the hit for all 8 bytes, at least in data center 1, until you exceed 2 billion there.
In a system I managed I did this a slightly different way - we had multiple web servers with Express instances that were responsible for generating ID numbers that needed to be unique in the enterprise. So we just set up a sequence generator on each machine (they didn't need to actually store the value) using a BIGINT IDENTITY column. We had < 9 servers, so they were all seeded like this:
-- WEBAPP0001
ID BIGINT IDENTITY(10000000000,1)
-- WEBAPP0002
ID BIGINT IDENTITY(20000000000,1)
-- WEBAPP0003
ID BIGINT IDENTITY(30000000000,1)
...
When the values were used and then merged in the central system, not only were we guaranteed to not have any duplicates, but it was also easy to instantly recognize which web server they came from (which was sometimes useful in debugging) without introducing any composite key requirements. And we had no concerns that any one web server would ever generate more than 10 billion values.
Many people would go with a GUID in this scenario, but I think there are several strong arguments against that approach.
Best Answer
This is an ongoing debate that many face, and experts will give good arguments for either of these. What your question boils down to is a surrogate vs. natural key (and option two seems to be a half-surrogate half-natural key).
As for your third option, oftentimes I see people do whatever they can to find a unique candidate for a primary key constraint. Take you for instance: name is your example key. That is a notoriously bad candidate for a key, as there is no uniqueness guaranteed with that. Think "John Smith". Not only that, but having that as a key will bloat your foreign key relationship, as when you go to reference that primary key in a relationship table then you will have that same data type and column width. Think nvarchar() as opposed to a narrow data type (i.e. int).
As for your second option, it sounds like you have the ability to create a unique value (by your description, able to fit in an int column). Where would you plan on doing this? This would either be logic in the database or in the application. In my opinion, I like to keep entities like keys isolated to the data tier. This "id" would most likely be used for normalized data. That should be an abstraction to the application (in my opinion). So now my reasoning brings me to either a database-generated unique value (your option two) or an identity value (your option one). Those both have the same result (as far as I can see it), therefore why go through the trouble and the development to generate your own when you can just have an
IDENTITY
column?Again, others may and probably will have different and equally objective answers for going with another alternative, but my vote would be for the
IDENTITY
column for a unique, surrogate key.