I modified @Phil Sandler's code to remove the effect of calling GETDATE() (there may be hardware effects/interrupts involved??), and made rows the same length.
[There have been several articles since SQL Server 2000 relating to timing issues and high-resolution timers, so I wanted to minimise that effect.]
In simple recovery model with data and log file both sized way over what is required, here are the timings (in seconds): (Updated with new results based on exact code below)
Identity(s) Guid(s)
--------- -----
2.876 4.060
2.570 4.116
2.513 3.786
2.517 4.173
2.410 3.610
2.566 3.726
2.376 3.740
2.333 3.833
2.416 3.700
2.413 3.603
2.910 4.126
2.403 3.973
2.423 3.653
-----------------------
Avg 2.650 3.857
StdDev 0.227 0.204
The code used:
SET NOCOUNT ON
CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(88))
CREATE TABLE TestInt (Id Int NOT NULL identity(1,1) PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))
DECLARE @Numrows INT = 1000000
CREATE TABLE #temp (Id int NOT NULL Identity(1,1) PRIMARY KEY, rowNum int, adate datetime)
DECLARE @LocalCounter INT = 0
--put rows into temp table
WHILE (@LocalCounter < @NumRows)
BEGIN
INSERT INTO #temp(rowNum, adate) VALUES (@LocalCounter, GETDATE())
SET @LocalCounter += 1
END
--Do inserts using GUIDs
DECLARE @GUIDTimeStart DateTime = GETDATE()
INSERT INTO TestGuid2 (SomeDate, batchNumber)
SELECT adate, rowNum FROM #temp
DECLARE @GUIDTimeEnd DateTime = GETDATE()
--Do inserts using IDENTITY
DECLARE @IdTimeStart DateTime = GETDATE()
INSERT INTO TestInt (SomeDate, batchNumber)
SELECT adate, rowNum FROM #temp
DECLARE @IdTimeEnd DateTime = GETDATE()
SELECT DATEDIFF(ms, @IdTimeStart, @IdTimeEnd) AS IdTime, DATEDIFF(ms, @GUIDTimeStart, @GUIDTimeEnd) AS GuidTime
DROP TABLE TestGuid2
DROP TABLE TestInt
DROP TABLE #temp
GO
After reading @Martin's investigation, I re-ran with the suggested TOP(@num) in both cases, i.e.
...
--Do inserts using GUIDs
DECLARE @num INT = 2147483647;
DECLARE @GUIDTimeStart DATETIME = GETDATE();
INSERT INTO TestGuid2 (SomeDate, batchNumber)
SELECT TOP(@num) adate, rowNum FROM #temp;
DECLARE @GUIDTimeEnd DATETIME = GETDATE();
--Do inserts using IDENTITY
DECLARE @IdTimeStart DateTime = GETDATE()
INSERT INTO TestInt (SomeDate, batchNumber)
SELECT TOP(@num) adate, rowNum FROM #temp;
DECLARE @IdTimeEnd DateTime = GETDATE()
...
and here are the timing results:
Identity(s) Guid(s)
--------- -----
2.436 2.656
2.940 2.716
2.506 2.633
2.380 2.643
2.476 2.656
2.846 2.670
2.940 2.913
2.453 2.653
2.446 2.616
2.986 2.683
2.406 2.640
2.460 2.650
2.416 2.720
-----------------------
Avg 2.426 2.688
StdDev 0.010 0.032
I wasn't able to get the actual execution plan, as the query never returned! It seems a bug is likely. (Running Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64))
Foreign Key Constraints and Indexes are two completely separate entities within a database. As you have explicitly stated in your question, foreign key relationships are to ensure Referential Integrity. If your database copy is soley for OLAP that wouldn't really come into play (as long as you aren't doing any DML batches against it).
The indexes are what are going to impact your query performance. And if the above is true (no INSERT
, UPDATE
, or DELETE
), you shouldn't be too concerned with the overhead of index maintenance.
Because of this, you can wisely choose indexes in order to maximize performance. But without seeing more details on your data, and what already exists, generalized ideas are all that can be given to you.
EDIT: As per Grant Fritchey's blog post, there can be a performance gain due to foreign key constraints, as the query optimizer can omit certain unnecessary operations due to the referential aspects of the constraint. For more information, please see Grant's blog post: Do Foreign Key Constraints Help Performance?
Best Answer
I'll answer here as well. It has to do with the internals of how
IDENTITY
andSEQUENCE
work.With
IDENTITY
, SQL Server pre-caches values into memory so that they are readily available. See Martin Smith's answer for the details. As values are used, a background process generates more values. As you can imagine this pool can run out pretty quickly, leaving the application at the mercy of the background process that is generating the values.With
SEQUENCE
, SQL Server allows you to define how large the cache should be. While SQL Server doesn't actually keep the values in the cache, it only keeps the current value and the top end value, this will greatly reduce the amount of IO that is needed to create values.Don't set the cache too high, as this will reduce the number of numbers which can be used: if SQL Server were to crash, any values specified in the current cache range which weren't used would be lost.
As for row insertion, just specify a default value for the column, like so: