Turning my comment into an answer.
There seems to be some misunderstandings of how certain performance counters work when it comes to availability groups. Only Synchronous Commit replicas will be counted toward Transaction Delay (ms)/Sec
Asynchronous Commit replicas will not change this counter as they transfer no mirrored transactions (yes, this could have been called something better).
Let me correct some of the assumptions in the thought process in the original question.
2.Using AG1 as Sync has a Transaction Delayed time of 1000ms/sec which is very bad and we can't figure out why. HADR_SYNC_COMMIT wait time is all over the place.
Transaction Delay (ms)/Sec, in a vacuum, means only that you have at least two synchronous commit replicas. This is because ONLY synchronous commit replicas have this value as we wait for the acknowledgements from the other synchronous replica that the data is hardened.
In order to have a complete picture and get an AVERAGE of how much latency overhead PER TRANSACTION we will need to capture the Mirrored Transactions/Sec
counter. This counter will give us how many synchronous commit transactions were "mirrored" to other synchronous replicas. If we take the Transaction Delay (ms)/Sec
and divide it by the Mirrored Transactions/Sec
counter the end result will be the average delay in ms per transaction.
3.Using AG1 with Async we have 0 latency, but that is something we can't use.
This is not accurate at all. Ther, in fact, is GREATER latency is getting the transaction blocks hardened on the asynchronous commit replicas it's just that we don't wait for them to be hardened like we do with synchronous. Thus, there is still a delay (your network doesn't magically become a non-wait environment) it's just not counted in that specific counter as that counter ONLY contains information for synchronous commit replicas and transactions.
6.During the insert which is a single session running sequential insert commands (or update, or delete) I ran a trace on that session and saw that the duration was exactly the same as the graphs below.
There is not nearly enough information to make me believe this is the case. Even on some pretty terrible infrastructure I normally see 200-300 ms and nothing near 1 second PER individual transaction.
7.When running the same test without Sync (meaning Async, or directly on each node) I didn't see any latency, the duration for each operation was 0 (zero).
See my response to 3. which is also the same answer to this.
Follow Up
- Capture both counters,
Transaction Delay (ms)/Sec
AND Mirrored Transactions/Sec
and post back the results.
- Test against your specific workload as the change in workload types will greatly factor in to response times.
Best Answer
There isn't a one sized fits all answer, so assuming that the reporting workload doesn't run the secondary replica out of CPU/Memory/Disk/Network resources and there isn't any contention for said resources then the primary should not be impacted but log growth will occur, which in the end could impact the performance of the AG/SQL.
That's the gist of it, if you can't truncate and are growing log that may eat up CPU/Disk time that is wasted or cause an issue with the disk growing too much and eating all available space. Don't laugh (or do!), I have witnessed it multiple times - it really does happen.
The other item which isn't discussed is synchronization and redo. Assuming synchronization stays nominal (synchronized) the redo may fall behind. Yes, at some point it might "catch up" but the SLA was already missed. REDO queue build up is a thing in extremely high performant environments and even though SQL Server 2016+ has the potential for parallel redo it doesn't mean it'll always be faster or that the database is using parallel redo. I'm assuming we don't care about automatic failover or failover times (database startup times will be affected).
ASSUMING all of the previous, it acts no differently than 2012/2014 in terms of readable secondary replicas. Yes, there were enhancements but the real reason why there are performance implications on the primary is mostly due to a resource being run out on a secondary and causing long waits which show up in the primary as
HADR_SYNC_COMMIT
.