Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.
For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.
For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.
Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.
Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.
The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.
Taking all of this into consideration, I would recommend Option A.
I hope this helps,
Matt
First thing is you will need to stop performing shorthand math operations like + against date/time values. This breaks when you switch to the newer types, which you should do.
For this particular problem, you can simply use a CASE
expression to either (a) return the datetime
as is, or (b) convert to date
and then add 11 hours.
CASE WHEN DATEPART(HOUR, col) < 11
THEN col
ELSE DATEADD(HOUR, 11, CONVERT(datetime, CONVERT(date, col)))
END
Best Answer
No. You do not want to make that change as no good can come of it.
You currently have a PK (Clustered, I assume) that is:
You are wanting to change this to a Clustered PK that is:
NVARCHAR
).UNIQUEIDENTIFIER
it is not so bad as that is still a binary comparisons, just like withINT
, but is 16 bytes unlikeINT
which is 4. Storing this as a string is now a 36 character comparison (36 bytes inVARCHAR
and 72 bytes inNVARCHAR
) which is slower than the 16 byteUNIQUEIDENTIFIER
. Finally, most people storing GUIDs as strings forget to use a binary Collation (e.g.Latin1_General_100_BIN2
) to at least make the comparisons byte-by-byte since linguistic rules aren't necessary. Using a case-insensitive Collation, or even a case-sensitive one, is definitely slower as it will apply locale-based linguistic rules.FILLFACTOR
to reduce the number of page splits, you are also reducing the performance of the index as it is spread out over a larger amount of pages.Please keep in mind the down-stream negative effects of this change due to:
Clustered index keys are copied into non-clustered indexes. Assuming, again, that this PK is Clustered, then each non-clustered index on this table will have that 72 byte value copied into it. Three non-clustered indexes on this table is 72 bytes * 3 = 216 bytes plus the original 72 of the Clustered index = 288 bytes total. On the other hand, the current
INT
would only be 4 bytes * 3 = 12 bytes plus the original 4 = 16 bytes total. Per row.PKs are often used by FKs, which is the PK copied into one or more other tables. 72 bytes per row is just for this table. If this PK is used in 2 other tables as a Foreign Key, then that is 72 bytes * 2 = 144 extra bytes. On the other hand the current
INT
would only be 4 bytes * 2 = 8 bytes.Is the FK column indexed? If yes, then that is another 72 bytes instead of 4.
Data pages are loaded into memory (i.e. the Buffer Pool) before their row(s) can be read and used. Larger rows and/or lower
FILLFACTOR
means more data pages are required to hold those rows. That means more time to read them from disk into memory, and obviously the more memory they will require. This need competes with other queries, plan cache, etc.If you need to have a GUID to have a value that is know to external systems, just add a
UNIQUEIDENTIFIER
column and index it; you can then look it up to get theINT
value to use for all other JOINs, etc. And, if it has to be stored as a string, then useVARCHAR
given the only characters areA
,B
,C
,D
,E
,F
, and-
(NVARCHAR
is a completely unnecessary waste of space) and be sure to specify a binary Collation (one ending in_BIN2
).But don't change the current structure.
Regarding the update to the question in which it was clarified that the intended datatype to change to is really
UNIQUEIDENTIFIER
and notNVARCHAR(36)
: the overall advice does not change from "do not do it". WhileUNIQUEIDENTIFIER
is a better choice thanNVARCHAR(36)
(smaller and comparisons are binary), it is really just a "less bad" choice more than a "better / good choice".