Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.
For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.
For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.
Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.
Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.
The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.
Taking all of this into consideration, I would recommend Option A.
I hope this helps,
Matt
Since you are indicating insert performance is the primary concern, I'd take the recommendation of the DBA and make the clustering key the identity column since it is a unique, monotonically ascending number, which is guaranteed to (almost) never cause page-splits on the table.
Also, don't store the GUID in an nvarchar(100)
column, use the data type designed for them, which is uniqueidentifier
.
Look carefully at the reporting requirements prior to adding indexes to this table. You may find out you don't need an index on the GUID column at all. If you determine that having an index on the GUID column is required, you may find it desirable to have the leading column of the index not be the GUID, perhaps it will be the Group
or TestID
column instead?
If you've determined, through a rigorous design process, that the majority of reporting queries against this table will be made using the session id as the primary component of joins or where clauses, you may actually benefit from the table being clustered on the session id. You'd likely want to set the page fill factor at some point below 100%, perhaps start at 93% and monitor fragmentation levels. Also, it's worth pointing out that fragmentation may not be such a massive problem if you store this table on SSD. Will the session id be propagated throughout the other related tables? If so, you'll likely need non-clustered indexes on the column in those other tables as well, compounding the fragmentation issue. It may be worthwhile maintaining a single table where you insert the GUIDs to obtain a surrogate integer key to use in the other tables, something like:
IF OBJECT_ID('dbo.SessionSurrogate') IS NOT NULL
DROP TABLE dbo.SessionSurrogate;
CREATE TABLE dbo.SessionSurrogate
(
SessionID int NOT NULL
CONSTRAINT PK_SessionSurrogate
PRIMARY KEY NONCLUSTERED
IDENTITY(1,1)
, SessionGUID uniqueidentifier NOT NULL
);
CREATE UNIQUE INDEX CI_SessionSurrogate --could be non-unique if required
ON dbo.SessionSurrogate(SessionGUID);
IF OBJECT_ID('dbo.UserTestGroups') IS NOT NULL
DROP TABLE dbo.UserTestGroups;
CREATE TABLE dbo.UserTestGroups
(
UserTestGroupsID int NOT NULL
CONSTRAINT PK_UserTestGroups
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SessionID int NOT NULL
, TestID bigint NOT NULL
, [Group] tinyint NOT NULL
, InsertDate datetime NOT NULL
);
GO
CREATE PROCEDURE dbo.InsertSession
(
@SessionGUID uniqueidentifier
, @TestID bigint
, @Group tinyint
)
AS
BEGIN
INSERT INTO dbo.SessionSurrogate (SessionGUID)
OUTPUT INSERTED.SessionID, @TestID, @Group, GETDATE()
INTO dbo.UserTestGroups (SessionID, TestID, [Group], InsertDate)
OUTPUT INSERTED.SessionID
VALUES (@SessionGUID);
END;
GO
You'd then insert data like:
DECLARE @SessGUID uniqueidentifier = NEWID();
DECLARE @TestID bigint = 42;
DECLARE @Group tinyint = 6;
EXEC dbo.InsertSession @SessGUID, @TestID, @Group;
The benefit here is you're only causing fragmentation on a single table with a very narrow row width (20 bytes), offering somewhere around 400 rows per page. The stored procedure above is very compact and outputs the generated int
for you to use later, if necessary.
Best Answer
In a previous role at a different Organisation to where I am now employed, we faced a similar decision whereby the GUID issue (as pointed out in the link) was a factor for us (both Width and poor choice regarding Index fragmentation) - but we needed to move to Transactional Replication (Peer-to-Peer) to allow our Sites to be Geographically separated.
In our Dev environment we worked through converting most of the GUID's to Identity columns, set up the replication to match the intended Production system and ensured that the Identity columns had separate ranges per node (as per http://technet.microsoft.com/en-us/library/ms146907(v=sql.105).aspx - this also gives suggestions for Automatic Identity Range Management, which may work for you).
Sounds good so far? NO...
The Identity columns were a nightmare to manage, through our Dev testing and working on automating the separation of the ranges, we gave up on them (never made it into production) - in my opinion, it is more trouble than it's worth - we settled back on the GUID's that were already in the database and settled with the known issues of a bit of wasted memory and Index Fragmentation.
For us, this wasn't that detrimental a solution - as almost all of the usage are Write operations with fairly low reads - obviously we have a maintenance plan that tries to take care of the Fragmentation as best it can (there are some Update operations - so Read and then Write but the delay is within threshold).
My 2 pence are - avoid Identity Columns (unless you already have a great plan for them - or you can add a static unit (i.e. Server 1 always has a preceding letter A1, A2 etc. Server 2 = B1, B2, etc)) when replicating and test the life out of it - assuming you can set up a replica Dev system or whatever. Of course it depends on the business whether you can put up with the poor read access that GUIDs provide because of their Index issues!
Good Luck.