Sql-server – Profiler or MDW

performancesql server

Our application support people have stated that something is failing in one of the applications presumably due to some unusual activity/load in the database during specific hours of the day – specifically between 08:30 pm to 09:30 pm every night.
No tangible evidence has been provided, but rather, they are interested to see the statistics/charts of whatever that is happening in the database during the problem hour, and see if they can get any clues from it.
Or at least, to make sure that there are no issues on the side of the database.

Previously I had told them that tracing sessions can help us understand what happens in the database during problem hours. (Mybe I shouldn't have said that?)
Now they are pestring me to provide them with trace data but they don't know what events should be traced. They have given me a vague request that "Can you please provide statistics of good and bad period so we can compare?"

The issue is that I'm note sure what events should be included, either.

To help them troubleshout this issue, I have also been thinking of using MDW (Management Datawarehouse). It's currently not enabled.

Here are m questions:

  • If I eventually decide to enable a SQL Server trace (other than the default which is currently active), what events should I target for the session?
  • Isn't MDW a better choice? What does trace provide (in terms of performance insight) that MDW doesn't?
  • Is MDW a licensed feature? (personally I don' think so but need to be sure)

About the environment:

This is a SQL Server 2008 R2 database, awaiting for an upgrade in the first half of next year. This is a production database.

SolarWinds is used for monitoring of our databases. As per my understanding, it collects a great deal of data already (things like user connections, active sessions, active transactions, Lazy writes/sec, etc). These data are collected every 5 minutes.

We have Oracle DBAs in the team but no other SQL Server DBA. That means unfortunately no one else is around to be able to help me.

I am also new in this environment, not really proficient with SQL Server, at least with regards to performance tuning, as I was mostly doing Oracle in my life.

Best Answer

This is a very broad question because you have no specifics. Tracing and MDW are not necessarily going to give you the answers here, it depends entirely on what

something is failing in one of the applications

actually means. Firstly, have the app support team provided any error messages, symptoms, details or evidence beyond that statement. If they're not reporting any errors or specific symptoms then there is no way to know what to trace - it could be a performance issue, a failing stored procedure execution with no error handling, a race condition between two sessions affecting data in the wrong order (isolation levels) or one of dozens of other issues.

If they have provided information about a specific error occurring, you can configure a trace using the TSQL_Replay template, but under the Errors and Warnings category, include User Error Message as well. Using Profiler to setup the trace is probably easiest and you can then export it to a script file and schedule it as an Agent job for the problem period. This would capture all of the TSQL events during the window and you can identify the actions leading up to the error to determine what is happening.

If it is a specific performance issue (i.e. this app process/task runs very slowly), then you need to identify the cause of the performance issue. The SolarWinds data will helpful here in identifying high utilisation of different resources (Memory, CPU, IO), and check out sp_whoisactive which is a great tool for logging\capturing session information. You can configure a SQL Agent job to run frequently during the bad period and capture session information and try to identify the sessions experiencing or causing the specific performance issue that was reported.

Lastly, if the complaint from Apps is more general, i.e. "performance just sucks between 08:30 PM and 09:30 PM every night", then SolarWinds will be useful here in viewing the performance metrics from a normal period against the bad period and identifying the metrics that deviate from the baseline.

  • Does CPU spike during this period?
  • Does IO latency increase orthroughput decrease during this period?
  • Are there maintenance jobs(backups, integrity checks, index maintenance) running during this period?

Check out this article on performance troubleshooting methodology in SQL Server, it's a great primer for SQL performance investigations.

Finally, as to your specific questions:

If I eventually decide to enable a SQL Server trace (other than the default which is currently active), what events should I target for the session?

The TSQL_Replay template provides good coverage of the TSQL activity occurring on the server and would be a good starting point until you narrow your problem down.

Isn't MDW a better choice? What does trace provide (in terms of performance insight) that MDW doesn't?

Different tools for different uses. MDW is more like for performance monitoring over time, tracking statistics that you can use to develop baselines. SolarWinds is likely already collecting everything you need that MDW could provide.

Is MDW a licensed feature? (personally I don' think so but need to be sure)

It's available in all editions except Express.