We've set up log shipping to a secondary SQL server on Standby/ Read-Only to offload all SSRS report generation.
This works fine within the restrictions imposed by:
- Kicking out the user during the transaction log restore (we got around this by setting up multiple instances and restoring the most recent transaction logs using a round-robin schedule)
- The data being out of date by, at most, the time-frame indicated by the scheduled transaction log backup/ restore job.
Unfortunately, the first time any/ all stored procedure are run, after the transaction log was restored, it takes a much longer time to complete than normal. All subsequent executions of that same stored procedure complete within the expected time. If we then execute another stored procedure, the first time it is slow and all subsequent executions complete in the expected time.
For reference, the difference in execution is ~00:02 normally compared to ~01:00 on the first run.
I assume this has something to do with either the server execution statistics or the stored procedure parameter sniffing/ stored execution plan.
Is there any way to get around this issue? Or is this inherent to the transaction log restore?
If it was just the very first execution of any stored procedure we could get around this easily by executing any stored procedure upon restore, but it appears to affect the first time all stored procedures are executed.
I tried running count( * )
on the 11 tables the stored procedure I'm using for testing touches. The first run took 00:32, and subsequent count(*) took 00:00. Unfortunately, this did not have any impact on the first run of the stored procedure.
I don't see any results on either my primary or secondary servers for is_temporary
stats, either before or after execution of a stored procedure.
I'm currently on SQL Server 2012
Query Exection Plan:
The query execution plan at first glance appears significantly different, however, upon saving the execution plan and opening the .sqlplan file generated they are exactly the same. The difference appears to be coming from the different versions of SSMS I am using, 2014 on the primary server and 2018 on the secondary. When viewing the execution plan on the secondary it shows underneath every node's % and time cost ### of ### (##%) – neither those numbers, nor the actual execution plan change upon further executions.
I also included client statistics and they show almost exactly the same, the only difference being the primary server executes with 1.4 seconds of Wait time on server replies and the secondary takes 81.3 seconds.
I do see a large number of PAGEIOLATCH_SH locks from the first execution, as you predicted:
diff after first exec vs diff after second exec
waiting_tasks_count 10903 918
wait_time_ms 411129 12768
One of the odd things about this situation, is, except for the round-robin multiple instances part of the setup we already have our production SSRS server reading from a standby/ read-only database that is fed by periodic transaction logs and do not experience these slow downs on the first execution of a stored procedure. Our users are kicked off every time the transaction log is restored, though, which is the problem the above setup is supposed to resolve.
Best Answer
There are a few possible things going on here, here's a non-exhaustive list:
PAGEIOLATCH*
waits during the initial run if you check wait statsSome things you could do to mitigate this are
SELECT COUNT(*) FROM dbo.YourTable
),Providing us with a "fast" and "slow" example of an execution plan could help us track down exactly which thing is happening.
If you are on SQL Server 2012 or newer, then it's possible that sync stats updates are causing the delay. These "readable secondary stats" get created in TempDB, since the log shipping secondary is read-only. You can read more about that here (the article is about AGs, but the same thing applies in this scenario):
AlwaysOn: Making latest statistics available on Readable Secondary, Read-Only database and Database Snapshot
If this is the problem causing your slowdown, then one solution would be to find those stats, and then create them in the production database, so that they are up-to-date and available after the restore. You can look for temp stats with this query:
Based on the wait stats you provided, and the fact that the plans are the same, this is pretty conclusively due to the buffer pool being cleared by the log restore.
On a normal run, you get 12,768 ms (almost 13 seconds) of IO waits.
On the first run, you get 411,129 ms (almost 7 minutes) of IO waits.
The
SELECT COUNT(*)
approach you tried may not have helped due to different indexes being used by the actual procedure vs theCOUNT(*)
query. You have a few options here:SELECT COUNT(*) FROM dbo.YourTable WITH (INDEX (IX_Index_Being_Used_By_Proc))
)