Background
As suggested by the front-end developer, I looked into using UUID as the primary key for a bunch of tables in our new system. From learning the pros and cons of random vs. sequential UUIDs, to the use of a non-clustered primary key in combination of a clustered index with a sort-able type, my research pointed me to UUIDv6, and an implementation of it.
It is able to generate UUIDs like the below (that is sequential):
UUIDv1 UUIDv6 ------------------------------------ ------------------------------------ 5714f720-1268-11e7-a24b-96d95aa38c32 1e712685-714f-6720-a23a-c90103f70be6 68f820c0-1268-11e7-a24b-96d95aa38c32 1e712686-8f82-60c0-ac07-7d6641ed230d 7ada38f0-1268-11e7-a24b-96d95aa38c32 1e712687-ada3-68f0-93f8-c1ebf8e6fc8c 8cc06fd0-1268-11e7-a24b-96d95aa38c32 1e712688-cc06-6fd0-a828-671acd892c6a 9ea6a6b0-1268-11e7-a24b-96d95aa38c32 1e712689-ea6a-66b0-910c-dbcdb07df7a4
Which I thought SQL Server would gladly sort them for me in the clustered primary key (uniqueidentifier) column.
Little did I know how SQL Server would sort an uniqueidentifier column. Here's the ascending sort result:
UUIDv6 uniqueidentifier sorted ------------------------------------ 1e712688-cc06-6fd0-a828-671acd892c6a 1e712686-8f82-60c0-ac07-7d6641ed230d 1e712687-ada3-68f0-93f8-c1ebf8e6fc8c 1e712685-714f-6720-a23a-c90103f70be6 1e712689-ea6a-66b0-910c-dbcdb07df7a4
Which is causing fragmentation as if using random UUIDs. This post explains how they were actually sorted.
The real question
Luckily, the system is still in development. Which of these options should I go for next?
- reorder the bytes so that the most/least significant bytes are where SQL Server expects them to be
UUIDv6 UUIDv6 reordered bytes ------------------------------------ ------------------------------------ 1e712685-714f-6720-a23a-c90103f70be6 c90103f7-0be6-a23a-6720-1e712685714f 1e712686-8f82-60c0-ac07-7d6641ed230d 7d6641ed-230d-ac07-60c0-1e7126868f82 1e712687-ada3-68f0-93f8-c1ebf8e6fc8c c1ebf8e6-fc8c-93f8-68f0-1e712687ada3 1e712688-cc06-6fd0-a828-671acd892c6a 671acd89-2c6a-a828-6fd0-1e712688cc06 1e712689-ea6a-66b0-910c-dbcdb07df7a4 dbcdb07d-f7a4-910c-66b0-1e712689ea6a
- convert the UUIDv6 to binary(16) and use that instead
UUIDv6 UUIDv6 binary(16) ------------------------------------ -------------------------------- 1e712685-714f-6720-a23a-c90103f70be6 1e712685714f6720a23ac90103f70be6 1e712686-8f82-60c0-ac07-7d6641ed230d 1e7126868f8260c0ac077d6641ed230d 1e712687-ada3-68f0-93f8-c1ebf8e6fc8c 1e712687ada368f093f8c1ebf8e6fc8c 1e712688-cc06-6fd0-a828-671acd892c6a 1e712688cc066fd0a828671acd892c6a 1e712689-ea6a-66b0-910c-dbcdb07df7a4 1e712689ea6a66b0910cdbcdb07df7a4
Problem with option 1
The UUID standard embeds a 4-bit version field inside the ID. UUIDv6 (still non-standard) also follows that rule. The way I will reorder them is going to break this.
Problem with option 2
I'm not sure. Can hardly find anyone talking about it except this, which is going against the idea. Are there other pitfalls that I should be aware of in using the binary(16) type?
Thanks!
Best Answer
Do that.
Option 1 is what SQL Server does natively with NEWSEQUENTIALID(), as wikipedia puts it
In SQL Server a UNIQUEIDENTIFIER is just a 128-bit binary type. It's not required to conform to the structure of a UUID.
And if it's sequentially-generated* you can make it the clustered index key. Having a narrower clustered index key is normally not worth the cost of having an extra and unnecessaary index.
*It's not a big deal if the location in the sort order of the sequential values changes occasionally or is generated in a few different places by a few different application servers.