Sql-server – Transactional replication pre and post scripts

sql serversql-server-2008-r2transactional-replication

Just a preface that I'm working at the moment as an accidental DBA so sorry for any ignorance.

I need to completely rebuild one of our transactional replications. The indexing strategy is different on the subscriber database than it is on the publisher, the primary key is the same but the clustered index on NC indexes are different. I am aware that the tables by default are completely dropped on the subscriber. I believe I can use scripts pre and post snapshot application to drop and create indexes.

Both the clustered indexes on the subscriber and publisher are UniqueIdentifier but different columns. The table has nearly 1 billion rows and is sadly pretty wide. Just thought that I'd provide that information in case it makes a difference.

My questions are:

Is using the post execution scripts the best way to go about this?
If the clustered index is different on the subscriber than the publisher am I best to allow the bulk insert to happen with the publisher clustered index or would it be better to set the alternate clustered index before the bulk insert?

Thanks in advance for any help.

Best Answer

At my last job this is exactly what we did for a very similar scenario (also on SQL Server 2008 R2 - Standard Edition), and I was the one to build it out. We had an off the shelf vendor application we heavily used and needed to report out of. It was indexed differently than how we were reporting out of it.

We had one way transactional replication setup to sync from the vendor app to our report server. I setup pre and post scripts to drop and recreate the appropriate indexes (and some other relevant entities like indexed views), which of course checked for object existence first too (you can check this via the sys schema's tables and views) before executing.

The scripts themselves just pointed to a stored procedure where all the logic lived, on the subscriber server. This just felt more natural for maintaining the code than updating an external script file. The pre script to drop the custom indexes before the schema was synced was important because there was a bug we ran into where transactional replication would occasionally break and stop syncing otherwise. (If you need a code example of my pre and post scripts, let me know and I'll update this answer.)

In regards to your second question, in general it's faster to bulk insert your records into a table and then add your indexes after (this gives SQL server the full picture to arrange the data in the most efficient manner, at one time). To do this with transactional replication, you'll need to de-select "copy clustered indexes" (and non-clustered indexes for that matter) in the Publisher's Properties window under Article Properties for that table. (Letting it sync the data with the Publisher's clustered index is essentially just doubling the work your server will need to do when you re-index it on the Subscriber with your preferred clustered index.)

Related Solutions

Sql-server – Should I remove this clustered index

Without further info, this is more of speculation but judging on what we have:

a table that is quite wide (1.3 to 4.0 rows per page on average)
the query that is slow is using:
- only PWFID on the join condition,
- two columns Title, SITime on the select list and
- no other column anywhere (WHERE, HAVING etc.)

Then a covering non-clustered index on (PWFID) INCLUDE (SITime, Title) will probably improve the efficiency of the query as it will need to read a narrower index (and no lookups to the table, whether it's clustered or a heap). No idea how much improvement it would be, as the query involves joining of 30 something tables - and the index will not be that narrow either, with the included 500 character column.

About converting the table to a heap:

This makes me wonder if getting rid of the clustered index and make this table a heap makes sense or not, because I can just "save" the clustered index and free some space for having a covering nonclustered index?

This is irrelevant I think, at least for this and similar queries. It might change/improve the behaviour of insert queries (as no clustered index will have to be maintained) but it may also degrade performance for other queries that depend on finding more columns from the clustered index.

And you won't be saving any (or much) space. The data has to be stored somewhere, whether the table is a heap or clustered.

Adding a NC index is much less drastic change and I wouldn't expect any side effects - apart from the wanted use of it in the query - but still needs to be tested.

Removing the clustered index and converting a table to a heap is effectively changing the structure of all NC indexes and removing of (the clustered) one, so it may have several and more serious effects on many operations/queries performed and would need much more testing.

SQL Server Performance – Comparing High Fragmented Heap Performance

Since the three table copies are brandnew, so there is no fragmentation in place. For a fair comparison I also re-builded all indexes of the original table.

A more realistic test would be to try recreate the fragmentation that would have resulted from the table being in each design from the start. This way you are comparing the result of the designs as they would look after real world use rather than after a fresh rebuild.

If your application keeps a full audit trail for that data then you could perhaps rebuild each copy by replying that audit. Otherwise you might need to make up some heuristic (for instance if the data includes a creation or last modified date inserting the rows into each copy in that order).

One thing to note when doing this is to do each insert individually rather than a block copy - when you insert or update many rows at once it is bright enough to bunch the index updates to reduce page splits which it can't do with the individual inserts that would result from real application access. The following will illustrate this (using the amount of space allocated as an indication of free page space fragmentation due to page splits caused by the randomness of UUID ordering):

SET NOCOUNT ON
CREATE DATABASE TestFragUUID
GO
USE TestFragUUID
CREATE TABLE IndividualInserts (ID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, AnotherID UNIQUEIDENTIFIER UNIQUE)
CREATE TABLE SingleLargeInsert (ID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, AnotherID UNIQUEIDENTIFIER UNIQUE)
GO
INSERT IndividualInserts SELECT NEWID(), NEWID()
GO 100000
WITH CTen AS (SELECT TOP 10 name FROM sys.objects)
INSERT SingleLargeInsert 
SELECT NEWID(), NEWID()
FROM   CTen unit
CROSS JOIN cTen ten
CROSS JOIN cTen hun
CROSS JOIN cTen tho
CROSS JOIN cTen tth
GO
EXEC sp_spaceUsed 'IndividualInserts'
EXEC sp_spaceUsed 'SingleLargeInsert'
GO
USE master
DROP DATABASE TestFragUUID

Best Answer

Related Solutions

Sql-server – Should I remove this clustered index

SQL Server Performance – Comparing High Fragmented Heap Performance

Related Question