We are experiencing high disk I/O latency in a specific timeframe in SQL Server. Every day at exactly 4:30 AM we see a significant difference in disk I/O and it lasts up to 6:30 or 9:30 AM. I already checked with the development and storage teams. They confirmed that everything is running normally. Storage team declared that there is no issue in LUN and they are blaming SQL Server.
One thing is sure–some job is triggering exactly at 4:30 AM every day, which causes the high disk I/O. But I am not sure how to find that out. I attached the screen shot of the SQL error log at 4:30 AM. All the monitoring tools and the logs are stating that the transactions are taking longer that usual due to high disk I/O. But nowhere is it mentioned what is causing it.
Ours is 3 node Always On environment. P1 – primary, P2 – Stand only, P3 – Read only.
The issue we are facing now is from P3 (Read only node).
Best Answer
This is too long for a comment, but there won't be enough in here to provide the answer as to WHAT is causing this, just HOW to work out what IT is.
Firstly: What applications are connecting to databases on that server, and are they running batch processes against databases that have grown faster than you've kept track of, thus extending the time required to complete?
Are any backups being taken at this time? Context switching is really high, so there could be backups conflicting with batch processes running at the same time.
Here's an example of how you could capture the job/query causing your issues: