Sql-server – DB performance vs replication

optimizationreplicationsql server

I am working on a db optimization (planing for future project growth) and need some help.

Currently, every table is using uniqueidentifier as PK (clustered index) and we have high index fragmentation (99%). For the new tables we started using bigint as pk but I don't want a nightmare when bidirectional replication phase comes.

I did my research and uniqueidentifier is not a huge problem (except memory), problem is clusters index on that column (http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/).

Can we do this to solve our fragmentation problem and any replication nightmare:

  1. Add ROWGUIDCOL to PK column
  2. Add another identity column in every table and move that cluster
    index (not pk) to that column

Would this new identity column cause the same replication problems as if that was bigint PK?
I know SharePoint db primary keys are GUIDs but I am not happy with its performances and probably security has some part in that decision.
We could reduce fragmentation by using seq. GUIDs but we can't create it on application side or return it with SCOPE_IDENTITY(). OUTPUT inserted.ID hack would be time consuming because we need to rewrite complete application DAL.

Finally, is there any valid solution for this problem? Can we use bigint without any replication problem.

Best Answer

In a previous role at a different Organisation to where I am now employed, we faced a similar decision whereby the GUID issue (as pointed out in the link) was a factor for us (both Width and poor choice regarding Index fragmentation) - but we needed to move to Transactional Replication (Peer-to-Peer) to allow our Sites to be Geographically separated.

In our Dev environment we worked through converting most of the GUID's to Identity columns, set up the replication to match the intended Production system and ensured that the Identity columns had separate ranges per node (as per http://technet.microsoft.com/en-us/library/ms146907(v=sql.105).aspx - this also gives suggestions for Automatic Identity Range Management, which may work for you).

Sounds good so far? NO...

The Identity columns were a nightmare to manage, through our Dev testing and working on automating the separation of the ranges, we gave up on them (never made it into production) - in my opinion, it is more trouble than it's worth - we settled back on the GUID's that were already in the database and settled with the known issues of a bit of wasted memory and Index Fragmentation.

For us, this wasn't that detrimental a solution - as almost all of the usage are Write operations with fairly low reads - obviously we have a maintenance plan that tries to take care of the Fragmentation as best it can (there are some Update operations - so Read and then Write but the delay is within threshold).

My 2 pence are - avoid Identity Columns (unless you already have a great plan for them - or you can add a static unit (i.e. Server 1 always has a preceding letter A1, A2 etc. Server 2 = B1, B2, etc)) when replicating and test the life out of it - assuming you can set up a replica Dev system or whatever. Of course it depends on the business whether you can put up with the poor read access that GUIDs provide because of their Index issues!

Good Luck.