Sql-server – Sequential GUID or bigint for ‘huge’ database table PK

primary-keysql serveruniqueidentifier

I know this type of question comes up a lot, but I've yet to read any compelling arguments to help me make this decision. Please bear with me!

I have a huge database – it grows by about 10,000,000 records per day. The data is relational, and for performance reasons I load the table with BULK COPY. For this reason, I need to generate keys for the rows, and cannot rely on an IDENTITY column.

A 64-bit integer – a bigint – is wide enough for me to use, but in order to guarantee uniqueness, I need a centralised generator to make my IDs for me. I currently have such a generator service which allows a service to reserve X sequence numbers and guarantees no collisions. However, a consequence of this is that all the services I have are reliant on this one centralised generator, and so I'm limited in how I can distribute my system and am not happy about the other dependencies (such as requiring network access) imposed by this design. This has been a problem on occasion.

I'm now considering using sequential GUIDs as my primary keys (generated externally to SQL). As far as I've been able to ascertain from my own testing, the only drawback to these is the disk space overhead of a wider data type (which is exacerbated by their use in indexes). I've not witnessed any discernible slowdown in query performance, compared to the bigint alternative. Loading the table with BULK COPY is slightly slower, but not by much. My GUID-based indexes are not becoming fragmented thanks to my sequential GUID implementation.

Basically, what I want to know is if there are any other considerations I may have overlooked. At the moment, I'm inclined to take the leap and start using GUIDs. I'm by no means a database expert, so I'd really appreciate any guidance.

Best Answer

I'm in a similar same situation. Currently, I'm using the sequential GUID approach and have no fragmentation and easy key generation.

I have noticed two disadavantages that caused me to start migrating to bigint:

  1. Space usage. 8 bytes more per index. Multiply that by 10 indexes or so and you get a huge waste of space.
  2. Columnstore indexes do not support GUIDs.

(2) Was the killer for me.

I will now generate my keys like this:

yyMMddHH1234567890

I'll be using a leading date plus hour and having a sequential part after that. That allows me to range-query my data by date without any addition index at all. This is a nice bonus for me.

I'll generate the sequential part of the bigint using a HiLo algorithm that lends itself well to being distributed.

Hope some of this transfers to your situation. I definitely recommend using bigint.