Sql-server – Are there any performance differences with a GUID vs INT clustered indexed column, especially with joins on those columns

clustered-indexindexperformancequery-performancesql serveruuid

Basically what the title says: What are the differences in performance (if any) between using a clustered index on a GUID column vs a clustered index on an INT column, especially when those columns are part of your join predicates?

Edit: To clarify, I was always under the assumption GUIDs typically perform worse than INTs when used in join predicates.
I understand a GUID is 4x the data size of an INT, but I was asking my question to really understand why and what's going on "under the hood" that causes GUIDs to perform worse, even after the indexes are rebuilt and there's minimal fragmentation differences between the two types.

Best Answer

I understand a GUID is 4x the data size of an INT, but I was asking my question to really understand why and what's going on "under the hood" that causes GUIDs to perform worse, even after the indexes are rebuilt and there's minimal fragmentation differences between the two types.

I would say that the issue is primarily just what you started out with here, in what I quoted: UNIQUEIDENTIFIER is 4x bigger than INT (16 bytes compared to 4). This hurts performance in three ways (I am ignoring fragmentation since the question explicitly accounts for that):

Every bit compared is another thing to do. More things to do for the CPU takes more time. That time won't be noticed on modern systems when looking at 100 rows. But over 1 million rows? Now you are talking about (in terms of bytes), comparing 16 million things vs. comparing only 4 million things.
Memory is not free: neither in terms of money, nor in terms of time spent allocating and deallocating. In order to join rows, data pages containing those rows must be read into memory (the buffer pool). There is not an infinite amount of memory (especially if you are using Standard Edition or Express Edition). The more memory you take up in one operation is that much less left over for other concurrent operations. But even if memory were infinite, it still takes time to load those values into memory so that they can then be compared. Loading 16 million bytes naturally takes longer than loading 4 million bytes.
Disk is not free: even with SSD, physical I/O is still a cost you pay, which is why things are cached in memory. Data pages are only 8kb. If you have a 16 byte value, it will reduce the number of rows that can fit on that 8kb data page as compared to having a 4 byte value. The more data pages that a table takes up, the more physical I/O is needed to read and write the same number of rows.

A related issue is with indexes. You will likely index the columns that you are joining on. Same concern as with the main table: fewer index rows per page means more pages required for the index, meaning more time spent reading and writing to every index affected by any given operation (new row in a table will need to create that new row in all indexes on that table (minus filtered indexes that would exclude it), and assuming we are talking about the clustered index, the clustered index key is copied to all of the non-clustered indexes)

Related Solutions

Sql-server – ny tangible difference between a unique clustered index and a clustered primary key

One main difference is that the unique index can have a NULL value that is not allowed in the primary key. Clustered or not, this is the main difference between the practical implementation of a Primary Key versus a Unique Key.

Oh, and the fact that a table can have one PK and many UK :-).

These are both differences in INTENT not in PERFORMANCE. Otherwise, I don't think there's any difference. Behind any PK or UK the SQL Server builds an index (depending on the request, clustered or not) and the way it's used is transparent for the source is coming from.

Sql-server – Changing a column from NOT NULL to NULL – What’s going on under the hood

As alluded to by @Souplex in the comments one possible explanation might be if this column is the first NULL-able column in the non clustered index it participates in.

For the following setup

CREATE TABLE Foo
  (
     A UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
     B CHAR(1) NOT NULL DEFAULT 'B'
  )

CREATE NONCLUSTERED INDEX ix
  ON Foo(B);

INSERT INTO Foo
            (B)
SELECT TOP 100000 'B'
FROM   master..spt_values v1,
       master..spt_values v2

sys.dm_db_index_physical_stats shows the non clustered index ix has 248 leaf pages and a single root page.

A typical row in an index leaf page looks like

enter image description here

And in the root page

enter image description here

Then running...

CHECKPOINT;

GO

ALTER TABLE Foo ALTER COLUMN B  CHAR(1) NULL;


SELECT Operation, 
       Context,
       ROUND(SUM([Log Record Length]) / 1024.0,1) AS [Log KB],
       COUNT(*) as [OperationCount]
FROM sys.fn_dblog(NULL,NULL)
WHERE AllocUnitName = 'dbo.Foo.ix'
GROUP BY Operation, Context

Returned

+-----------------+--------------------+-------------+----------------+
|    Operation    |      Context       |   Log KB    | OperationCount |
+-----------------+--------------------+-------------+----------------+
| LOP_SET_BITS    | LCX_GAM            | 4.200000    |             69 |
| LOP_FORMAT_PAGE | LCX_IAM            | 0.100000    |              1 |
| LOP_SET_BITS    | LCX_IAM            | 4.200000    |             69 |
| LOP_FORMAT_PAGE | LCX_INDEX_INTERIOR | 8.700000    |              3 |
| LOP_FORMAT_PAGE | LCX_INDEX_LEAF     | 2296.200000 |            285 |
| LOP_MODIFY_ROW  | LCX_PFS            | 16.300000   |            189 |
+-----------------+--------------------+-------------+----------------+

Checking the index leaf again the rows now look like

enter image description here

and the rows in the upper level pages as below.

enter image description here

Each row has been updated and now contains two bytes for the column count along with another byte for the NULL_BITMAP.

Due to the extra row width the non clustered index now has 285 leaf pages and now two intermediate level pages along with the root page.

The execution plan for the

 ALTER TABLE Foo ALTER COLUMN B  CHAR(1) NULL;

looks as follows

enter image description here

This creates a brand new copy of the index rather than updating the existing one and needing to split pages.

Best Answer

Related Solutions

Sql-server – ny tangible difference between a unique clustered index and a clustered primary key

Sql-server – Changing a column from NOT NULL to NULL – What’s going on under the hood

Related Question