Sql-server – Almost sequential GUIDs vs random GUIDs

clustered-indexprimary-keysql serveruuid

I'm looking at using GUIDs (UUID) as primary key with clustered index in SQL Server.

In my webb-app I have many clients who will create items client side in javascript and save them only now and then.

The client code will create sequential GUIDs but items won't allways be saved straight away.

Am I loosing much of the benifit of using sequential GUIDs when the rows aren't saved in a perfect order?

Also, are there in the same way problems with occasionally mixing in normal random GUIDs?

Best Answer

Any GUID for an identity column will be an issue due to the size of it. This leaves less space in each page for actual data which means your data density is lowered and you are likely to see more page splits and fragmentation.

The problem when you use non sequential GUIDs on an id column with a clustered index is fragmentation. If you had 100 rows in the table with batches of 20 with sequential GUID values as the id and then wanted to add a random id value the engine may need to find space in the middle of your table to insert that record instead of just adding it to the end.

Depending on your fill factor level this could cause page splits which cause index fragmentation. Lower the fill factor of the index can help with fragmentation but not as much as always sequential values. This way each new record can be added to the newest page or a new page can be added at the end without the need to reorganize existing data.

Check here for more info.

Related Solutions

Sql-server – Why do sequential GUID keys perform faster than sequential INT keys in the test case

I modified @Phil Sandler's code to remove the effect of calling GETDATE() (there may be hardware effects/interrupts involved??), and made rows the same length.

[There have been several articles since SQL Server 2000 relating to timing issues and high-resolution timers, so I wanted to minimise that effect.]

In simple recovery model with data and log file both sized way over what is required, here are the timings (in seconds): (Updated with new results based on exact code below)

       Identity(s)  Guid(s)
       ---------    -----
       2.876        4.060    
       2.570        4.116    
       2.513        3.786   
       2.517        4.173    
       2.410        3.610    
       2.566        3.726
       2.376        3.740
       2.333        3.833
       2.416        3.700
       2.413        3.603
       2.910        4.126
       2.403        3.973
       2.423        3.653
    -----------------------
Avg    2.650        3.857
StdDev 0.227        0.204

The code used:

SET NOCOUNT ON

CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(88))

CREATE TABLE TestInt (Id Int NOT NULL identity(1,1) PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))

DECLARE @Numrows INT = 1000000

CREATE TABLE #temp (Id int NOT NULL Identity(1,1) PRIMARY KEY, rowNum int, adate datetime)

DECLARE @LocalCounter INT = 0

--put rows into temp table
WHILE (@LocalCounter < @NumRows)
BEGIN
    INSERT INTO #temp(rowNum, adate) VALUES (@LocalCounter, GETDATE())
    SET @LocalCounter += 1
END

--Do inserts using GUIDs
DECLARE @GUIDTimeStart DateTime = GETDATE()
INSERT INTO TestGuid2 (SomeDate, batchNumber) 
SELECT adate, rowNum FROM #temp
DECLARE @GUIDTimeEnd  DateTime = GETDATE()

--Do inserts using IDENTITY
DECLARE @IdTimeStart DateTime = GETDATE()
INSERT INTO TestInt (SomeDate, batchNumber) 
SELECT adate, rowNum FROM #temp
DECLARE @IdTimeEnd DateTime = GETDATE()

SELECT DATEDIFF(ms, @IdTimeStart, @IdTimeEnd) AS IdTime, DATEDIFF(ms, @GUIDTimeStart, @GUIDTimeEnd) AS GuidTime

DROP TABLE TestGuid2
DROP TABLE TestInt
DROP TABLE #temp
GO

After reading @Martin's investigation, I re-ran with the suggested TOP(@num) in both cases, i.e.

...
--Do inserts using GUIDs
DECLARE @num INT = 2147483647; 
DECLARE @GUIDTimeStart DATETIME = GETDATE(); 
INSERT INTO TestGuid2 (SomeDate, batchNumber) 
SELECT TOP(@num) adate, rowNum FROM #temp; 
DECLARE @GUIDTimeEnd DATETIME = GETDATE();

--Do inserts using IDENTITY
DECLARE @IdTimeStart DateTime = GETDATE()
INSERT INTO TestInt (SomeDate, batchNumber) 
SELECT TOP(@num) adate, rowNum FROM #temp;
DECLARE @IdTimeEnd DateTime = GETDATE()
...

and here are the timing results:

       Identity(s)  Guid(s)
       ---------    -----
       2.436        2.656
       2.940        2.716
       2.506        2.633
       2.380        2.643
       2.476        2.656
       2.846        2.670
       2.940        2.913
       2.453        2.653
       2.446        2.616
       2.986        2.683
       2.406        2.640
       2.460        2.650
       2.416        2.720

    -----------------------
Avg    2.426        2.688
StdDev 0.010        0.032

I wasn't able to get the actual execution plan, as the query never returned! It seems a bug is likely. (Running Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64))

Sql-server – Sequential GUID or bigint for ‘huge’ database table PK

I'm in a similar same situation. Currently, I'm using the sequential GUID approach and have no fragmentation and easy key generation.

I have noticed two disadavantages that caused me to start migrating to bigint:

Space usage. 8 bytes more per index. Multiply that by 10 indexes or so and you get a huge waste of space.
Columnstore indexes do not support GUIDs.

(2) Was the killer for me.

I will now generate my keys like this:

yyMMddHH1234567890

I'll be using a leading date plus hour and having a sequential part after that. That allows me to range-query my data by date without any addition index at all. This is a nice bonus for me.

I'll generate the sequential part of the bigint using a HiLo algorithm that lends itself well to being distributed.

Hope some of this transfers to your situation. I definitely recommend using bigint.

Best Answer

Related Solutions

Sql-server – Why do sequential GUID keys perform faster than sequential INT keys in the test case

Sql-server – Sequential GUID or bigint for ‘huge’ database table PK

Related Question