I have a question, I am managing 2 SQL servers
- is Master Server (Publisher + Distributor)
- is Client Server (Subscriber)
The client is basically owned by a client who is buying data from us.
Both Servers are replicated with each other using Transactional-replication.
Only selected Tables of a single database are replicated from the Master Server to Client Server.
Basically there are several problems i want to discuss.
Problem 1:
When I add or delete any data table and take snapshot of replication then all databases vanish from customer side and after that it rebuilds in 20-30 minutes (I only have limited access so don’t get logs)
Problem 2:
When I reinitialize subscription then it takes a lot of time to completing it and it is showing partially applied scripts of tables in monitoring.
Problem 3:
Publisher to Distributor showing high commands and transactions count but actually on distributor to subscriber commands and transaction counts are more than 100’s of times lower as compare to publisher section.
Problem 4:
Latency is Very high between Publisher-Distributor (around 5+ minutes)(Server-1) and jumps to 7 minutes then automatically less and stay on some seconds but average is really very high. Distributor-Subscriber latency seems fine.
Problem 5:
Log Size of Master DB increases and occupying space around 3.5 GB (Normally 3 GB with 80-87% free space) whether Data File Size also increases and now it’s 4 GB+ (Normally 3.3 – 3.5 GB)
Best Answer
First, you can set up Canary table and other detailed monitoring of transactional replication per Kendra Little'ss article Monitoring SQL Server Transactional Replication. Canary tables, the easiest step in the article, are described as:
Then take a look at the activity happening on the tables being replicated. It sounds like most of your problems are related to very high activity - your log size moving from 3GB with only 13-20% used, to 3.5GB (meaning all 3GB was used) is an indicator of big transactions, which may be saturating your replication bandwidth.
Also, if they take 20-30 minutes to replicate when reinitialized, that's a combination of their size and your free bandwidth.
So, the general fixes are:
Increase the bandwidth
Reduce the amount of data
Use the smallest integer based keys you can, and simply replicate lookup tables with descriptions, rather than have descriptions in your tables.
Replicate indexed views instead of actual tables (publisher = indexed view, subscriber = table) - send over only the required columns, and only the required rows.
Don't use NCHAR/NVARCHAR unless you need to - use CHAR/VARCHAR when you know you only have, or perhaps even should have, no UCS-2 "Unicode" data.
Reduce the number of transactions on the replicated data
Whatever you've got happening that causes all that activity on the replicated tables, analyze it. Is it required? Could it be re-written to minimize the amount of data being replicated?
If they don't absolutely need up to the minute data on any given table, then don't replicate your the core tables that get all this activity.
For each non-realtime-required table, create reporting tables on the publisher that get updated only every so often (once a week, once a day, once an hour, whatever), and replicate those.
And make sure to carefully use only the UPDATEs, INSERTs, and DELETEs that are required - the goal is to minimize replication load.
Don't use MERGE; it has too many issues.
Split the tables up into multiple publications, so you aren't taking a snapshot of, and reinitializing, every table every time you have to add or remove one.
I.e. If you add/drop/alter an article in Publication C, you need to reinitialize and snapshot that publication, but publications A, B, D, E, and F aren't affected by this.
And thus it moves less data, which in turn takes less time.
And the subscriber retains access to all the articles in all the other publications
So split them up along functional usage lines; if tables A, E, and G are used together most of the time, while C and D aren't involved in most of those queries, that's a natural split that will leave your client happier; at least some functions continue unobstructed during re-initialization of either set.
Increase the bandwidth