My developers have setup their application to use GUID's as PK for pretty much all of their tables and by default SQL Server has setup the clustered index on these PK's.
The system is relatively young and our biggest tables are just over a million rows, but we're taking a look at our indexing and want to be able to scale quickly as it may be needed in the near future.
So, my first inclination was to move the clustered index to the created field which is a bigint representation of a DateTime. However, the only way I can make the CX unique would be to include the GUID column in this CX but order by created first.
Would this make the clustering key too wide and would it boost performance for writes? Reads are important too, but writes are probably a bigger concern at this point.
Best Answer
The primary problems with GUIDs, especially non-sequential ones, are:
So what does this mean to your situation? It comes down to your design. If your system is simply about writes and you have no concern about data retrieval, then the approach outlined by Thomas K is accurate. However, you have to keep in mind that by pursuing this strategy, you're creating many potential issues for reading that data and storing it. As Jon Seigel points out, you will also be occupying more space and essentially having memory bloat.
The primary question around GUIDs is how necessary they are. Developers like them because they ensure global uniqueness, but it's a rare occasion that this kind of uniqueness is necessary. But consider that if your maximum number of values is less than 2,147,483,647 (the maximum value of a 4 byte signed integer), then you're probably not using the appropriate data type for your key. Even by using BIGINT (8 bytes), your max value is 9,223,372,036,854,775,807. This is typically enough for any non-global database (and many global ones) if you need some auto-incrementing value for a unique key.
Finally, as far as using a heap versus a clustered index, if you are purely writing data a heap would be most efficient because you are minimizing overhead for inserts. However, heaps in SQL Server are extremely inefficient for data retrieval. My experience has been that a clustered index is always desirable if you have the opportunity to declare one. I have seen the addition of a clustered index to a table (4 billion+ records) improve overall select performance by a factor of 6.
Additional information: