Sql-server – Indexing a PK GUID in SQL Server 2012

clustered-indexindex-tuningsql serveruniqueidentifier

My developers have setup their application to use GUID's as PK for pretty much all of their tables and by default SQL Server has setup the clustered index on these PK's.

The system is relatively young and our biggest tables are just over a million rows, but we're taking a look at our indexing and want to be able to scale quickly as it may be needed in the near future.

So, my first inclination was to move the clustered index to the created field which is a bigint representation of a DateTime. However, the only way I can make the CX unique would be to include the GUID column in this CX but order by created first.

Would this make the clustering key too wide and would it boost performance for writes? Reads are important too, but writes are probably a bigger concern at this point.

Best Answer

The primary problems with GUIDs, especially non-sequential ones, are:

  • Size of the key (16 bytes vs. 4 bytes for an INT): This means you're storing 4 times the amount of data in your key along with that additional space for any indexes if this is your clustered index.
  • Index fragmentation: It is virtually impossible to keep a non-sequential GUID column defragmented because of the completely random nature of the key values.

So what does this mean to your situation? It comes down to your design. If your system is simply about writes and you have no concern about data retrieval, then the approach outlined by Thomas K is accurate. However, you have to keep in mind that by pursuing this strategy, you're creating many potential issues for reading that data and storing it. As Jon Seigel points out, you will also be occupying more space and essentially having memory bloat.

The primary question around GUIDs is how necessary they are. Developers like them because they ensure global uniqueness, but it's a rare occasion that this kind of uniqueness is necessary. But consider that if your maximum number of values is less than 2,147,483,647 (the maximum value of a 4 byte signed integer), then you're probably not using the appropriate data type for your key. Even by using BIGINT (8 bytes), your max value is 9,223,372,036,854,775,807. This is typically enough for any non-global database (and many global ones) if you need some auto-incrementing value for a unique key.

Finally, as far as using a heap versus a clustered index, if you are purely writing data a heap would be most efficient because you are minimizing overhead for inserts. However, heaps in SQL Server are extremely inefficient for data retrieval. My experience has been that a clustered index is always desirable if you have the opportunity to declare one. I have seen the addition of a clustered index to a table (4 billion+ records) improve overall select performance by a factor of 6.

Additional information: