SQL Server Identity Seed – Benefits of Using -2,147,483,648 for Large Tables

identitysql server

I just had a showerthought that the default Identity Seed is 1. I have some tables that I know will grow to the billions at a certain point. Wouldn't it make more sense to start on int.Min ( -2,147,483,648) for these tables?

This could just make the difference of migrating your key to bigint in 4 years or in 8 years. Can be relevant enough.

Is this common? It feels weird. Is there anything I am missing?

Best Answer

There is nothing at all wrong with starting at -2,147,483,648 as far as SQL Server is concerned. Starting at 2,147,483,647 and counting backwards with IDENTITY(2147483647,-1) is perfectly valid too.

Things that would make me be wary of doing so:

  1. It might confuse people who don't expect to see negative values in such positions. It is unusual enough that it could easily look like bug. If nothing else you might get sick of explaining it.
  2. If you ever pass the IDs to other services for any reason they may fail input validation do to that code not expecting to see negatives (an external system shouldn't care about your internal IDs like this, but there are many things that happen which shouldn't!). Furthermore, code in other layers of your application might use an unsigned int32 making the "down from int.max" option even more dangerous as you'll not hit the problem until dropping below 0.
  3. People sometimes use negative magic numbers to have special meaning, and this could cause collisions which lead to difficult to explain bugs. For instance shortening WHERE (x.a<>y.a OR x.a IS NULL AND y.a IS NOT NULL OR x.a IS NOT NULL AND y.a IS NULL) to WHERE ISNULL(x.a,-1)<>ISNULL(x.a,-2) is something I've seen numerous times. Not likely for your PK as that'll never be NULL but might happen in comparing FK values in other tables. I've even seen someone use somevalue>-1 in place of somevalue IS NOT NULL (apparently there is a circumstance where that is more efficient, though he never explained to my satisfaction what the circumstance might be!). There might be that and other odd shenanigans in code outside the DB too.
  4. Most importantly: scale, particularly unexpected scale. Software and data that doesn't die early often outlives their creators' vision, and grows more over time accordingly. Unless you are very constrained by storage or RAM (perhaps in an embedded system) then plan for at least one order of magnitude higher than you expect.
    If I expect hitting ~400,000,000 is ever going to be likely in the lifetime of the data, far before the difference between 2 or 4 000,000,000 is a consideration, I'm already going to go for a larger key so doubling up by using negative values isn't going to make much difference.
    You don't want to be making a change to something core like a PK in four years time, but you even less want to do it in eight years time. In both cases if the design lasts that long you've long forgotten details and many other bits & bobs have started to depend on the key so the changes needing to be made grow massively, and even that single table is going to be massive work to change (unless most of the data gets deleted after a time) as it contains that many rows, then you have all those that refer to it with FKs to work on also.