Sql-server – Use GUID as primary key in Azure

azure-sql-databasesql server

I want to use GUIDs since they are portable and makes it easier to use several database. But since there are performance issues it might not be a good idea? Although I've found a MS blog entry which states otherwise: http://blogs.msdn.com/b/cbiyikoglu/archive/2012/05/17/id-generation-in-federations-identity-sequences-and-guids-uniqueidentifier.aspx

I've read about an alternative where a datetime2 column is used as the clustered index and the Guid as a PK. As most of my tables have a CreatedAtUtc column that should be a viable option for me. But do it really help?

So my questions is whether I should use GUIDs to get portability or if there are better solutions?

(Using GUIDs I can also create the Id in my code and do more stuff before committing). I've also created a COMB generator which should work better with MSSQL than regular GUIDs.

Best Answer

GUIDs only really cause performance problems for a couple of reasons.

First their size. You are using a 16 byte identifier where an int(4 bytes) or even bigint(8 bytes) is much smaller. This doesn't really matter so much on a wide table or short table. If your table is fairly narrow then the percent increase is of course greater. A 2000 byte width vs a 2012 byte width is really not that big a deal where a 20 vs 32 is obviously a greater difference. And if your table is going to be large, millions or even 100's of millions of rows, then that 8-12 bytes you are adding is going to make a more obvious difference than say a table with only 1000 rows. On the other hand if you need that portability then you may be willing to accept the space issues.

Next are the issues that only occur if the GUID is part of the clustered index. If the GUID is the clustered index then the GUID will be included the leaf level pages of all non-clustered indexes. So to compare, a table with an integer clustered index will only add 4 bytes to the leaf page while the GUID will add 16. This can add up over time particularly with multiple indexes. It also will of course increase the number of reads to use that index since you can fit less entries per page. The other problem GUIDs have with a clustered index is that they are not typically in order. This means potential page splits during inserts. The link you provided mentioned that this isn't a huge increase over the write times in Azure however. Typically you could use use NEWSEQUENTIALID() to generate the values as it returns a value greater than any previous value generated by NEWSEQUENTIALID() on that computer since windows was started however it is not supported by Azure.

In my opinion if you are already going to have a CreatedAtUTC column then go ahead and make it the clustered index and use the GUID as your primary key. Also here is a link I found to a blog by the Azure team on the subject. Of course he self processes to love GUIDs so he is somewhat biased. Also another benefit of using your CreatedAtUTC as the clustered index is that it doesn't change. This means minimal row moves.