SQL Server – Using IDENTITY Column with Increment -1

auto-incrementdatabase-designidentitysql serversql-server-2012

I need to run a sql server logging db with 2 main tables in 2 seperate datacentres writing to both at the same time.

I had the idea of restoring the db to the new datacentre and then reseeding the identity column to -1 and setting the increment to -1, that way there would never be any chance of duplicate id's when the data needs to be combined. DATACENTRE1 would be positive integers, DATACENTRE2 negative integers

would an increment of -1 cause any issues?

Best Answer

Going backwards just feels wrong to me.

With only two data centers you could also implement identity ranges. Unless you cycle through identity values at an alarming rate, there is no reason you can't have:

-- Data center 1
CREATE TABLE dbo.Table
(
  ID INT IDENTITY(1,1) PRIMARY KEY
  -- , ...
);

-- Data center 2
CREATE TABLE dbo.Table
(
  ID INT IDENTITY(1000000000,1) PRIMARY KEY
  -- , ...
);

This would allow to generate 1 billion (well, 999,999,999) values in data center 1 before there is any danger in collision with data center 2. And you could add a CHECK constraint in data center 1 to prevent overlapping values, depending on how you prioritize errors vs. duplicates. You could also implement a recurring job that periodically checks how close you are to the lower bound of the other data center, if you're concerned that you'll really generate over a billion values in either data center in the app's lifetime (never mind yours).

If ~1 billion is not enough, there are two other alternatives to give more room to breathe:

-- Data center 1
  ID INT IDENTITY(-2147483648,1)

-- Data center 2
  ID INT IDENTITY(1,1)

This would give each data center over 2 billion non-overlapping values, with no danger of collision until data center 1 approached 0. And if that's still not enough:

-- Data center 1
  ID BIGINT IDENTITY(-9223372036854775808,1)

-- Data center 2
  ID BIGINT IDENTITY(1,1)

Or if you want to keep all values > 0, you could split the positive range roughly in half (you can be a little more precise than this if you're more pedantic than me).

-- Data center 1
  ID BIGINT IDENTITY(1,1)

-- Data center 2
  ID BIGINT IDENTITY(4611600000000000000,1)

I don't even know how to say that number, but it's ginormous. And in reality you'd have to work extremely hard on a very fast computer to use up all those values by the time your great great great great great grandchild graduates from college. And if you are on Enterprise Edition you can use data compression so that you don't pay the hit for all 8 bytes, at least in data center 1, until you exceed 2 billion there.


In a system I managed I did this a slightly different way - we had multiple web servers with Express instances that were responsible for generating ID numbers that needed to be unique in the enterprise. So we just set up a sequence generator on each machine (they didn't need to actually store the value) using a BIGINT IDENTITY column. We had < 9 servers, so they were all seeded like this:

-- WEBAPP0001
ID BIGINT IDENTITY(10000000000,1)

-- WEBAPP0002
ID BIGINT IDENTITY(20000000000,1)

-- WEBAPP0003
ID BIGINT IDENTITY(30000000000,1)

...

When the values were used and then merged in the central system, not only were we guaranteed to not have any duplicates, but it was also easy to instantly recognize which web server they came from (which was sometimes useful in debugging) without introducing any composite key requirements. And we had no concerns that any one web server would ever generate more than 10 billion values.

Many people would go with a GUID in this scenario, but I think there are several strong arguments against that approach.