SQL Server – Should Frontend Generated UUIDv6 be Converted to Binary(16) for Clustered Primary Key?

clustered-primary-keysql serveruuid

Background

As suggested by the front-end developer, I looked into using UUID as the primary key for a bunch of tables in our new system. From learning the pros and cons of random vs. sequential UUIDs, to the use of a non-clustered primary key in combination of a clustered index with a sort-able type, my research pointed me to UUIDv6, and an implementation of it.

It is able to generate UUIDs like the below (that is sequential):

UUIDv1                               UUIDv6 
------------------------------------ ------------------------------------ 
5714f720-1268-11e7-a24b-96d95aa38c32 1e712685-714f-6720-a23a-c90103f70be6 
68f820c0-1268-11e7-a24b-96d95aa38c32 1e712686-8f82-60c0-ac07-7d6641ed230d 
7ada38f0-1268-11e7-a24b-96d95aa38c32 1e712687-ada3-68f0-93f8-c1ebf8e6fc8c 
8cc06fd0-1268-11e7-a24b-96d95aa38c32 1e712688-cc06-6fd0-a828-671acd892c6a 
9ea6a6b0-1268-11e7-a24b-96d95aa38c32 1e712689-ea6a-66b0-910c-dbcdb07df7a4

Which I thought SQL Server would gladly sort them for me in the clustered primary key (uniqueidentifier) column.

Little did I know how SQL Server would sort an uniqueidentifier column. Here's the ascending sort result:

UUIDv6 uniqueidentifier sorted 
------------------------------------ 
1e712688-cc06-6fd0-a828-671acd892c6a 
1e712686-8f82-60c0-ac07-7d6641ed230d 
1e712687-ada3-68f0-93f8-c1ebf8e6fc8c 
1e712685-714f-6720-a23a-c90103f70be6 
1e712689-ea6a-66b0-910c-dbcdb07df7a4

Which is causing fragmentation as if using random UUIDs. This post explains how they were actually sorted.

The real question

Luckily, the system is still in development. Which of these options should I go for next?

  1. reorder the bytes so that the most/least significant bytes are where SQL Server expects them to be
UUIDv6                               UUIDv6 reordered bytes 
------------------------------------ ------------------------------------ 
1e712685-714f-6720-a23a-c90103f70be6 c90103f7-0be6-a23a-6720-1e712685714f 
1e712686-8f82-60c0-ac07-7d6641ed230d 7d6641ed-230d-ac07-60c0-1e7126868f82 
1e712687-ada3-68f0-93f8-c1ebf8e6fc8c c1ebf8e6-fc8c-93f8-68f0-1e712687ada3 
1e712688-cc06-6fd0-a828-671acd892c6a 671acd89-2c6a-a828-6fd0-1e712688cc06 
1e712689-ea6a-66b0-910c-dbcdb07df7a4 dbcdb07d-f7a4-910c-66b0-1e712689ea6a
  1. convert the UUIDv6 to binary(16) and use that instead
UUIDv6                               UUIDv6 binary(16) 
------------------------------------ -------------------------------- 
1e712685-714f-6720-a23a-c90103f70be6 1e712685714f6720a23ac90103f70be6 
1e712686-8f82-60c0-ac07-7d6641ed230d 1e7126868f8260c0ac077d6641ed230d 
1e712687-ada3-68f0-93f8-c1ebf8e6fc8c 1e712687ada368f093f8c1ebf8e6fc8c 
1e712688-cc06-6fd0-a828-671acd892c6a 1e712688cc066fd0a828671acd892c6a 
1e712689-ea6a-66b0-910c-dbcdb07df7a4 1e712689ea6a66b0910cdbcdb07df7a4

Problem with option 1

The UUID standard embeds a 4-bit version field inside the ID. UUIDv6 (still non-standard) also follows that rule. The way I will reorder them is going to break this.

Problem with option 2

I'm not sure. Can hardly find anyone talking about it except this, which is going against the idea. Are there other pitfalls that I should be aware of in using the binary(16) type?

Thanks!

Best Answer

reorder the bytes so that the most/least significant bytes are where SQL Server expects them to be

Do that.

Option 1 is what SQL Server does natively with NEWSEQUENTIALID(), as wikipedia puts it

NEWSEQUENTIALID function returns 128-bit identifiers similar to UUIDs which are committed to ascend in sequence until the next system reboot

In SQL Server a UNIQUEIDENTIFIER is just a 128-bit binary type. It's not required to conform to the structure of a UUID.

And if it's sequentially-generated* you can make it the clustered index key. Having a narrower clustered index key is normally not worth the cost of having an extra and unnecessaary index.

*It's not a big deal if the location in the sort order of the sequential values changes occasionally or is generated in a few different places by a few different application servers.