Probably Parameter Sniffing. To troubleshoot this without running Profiler
1) Add a GUID into the text of the query as a comment. e.g.
/*ED7446E4-03B1-4279-AB43-6FFE800AD860*/
2) Wait for the problem to recur so that it runs slow from the application and fine from SSMS.
3) Compare the query plans for the 2 different environments (retrievable from the below)
SELECT *
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(plan_handle) t
CROSS APPLY sys.dm_exec_query_plan(plan_handle) qp
CROSS APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
WHERE t.text LIKE '%ED7446E4-03B1-4279-AB43-6FFE800AD860%'
AND epa.attribute = 'set_options'
... was hoping I could get ... a rough rough estimate of what we should be running.
Without more information about your queries and data sizes, it's really difficult to give you any kind of estimate, let alone an accurate estimate.
Database: sql server 2008 r2 enterprise database
Windows: Windows 2008 r2 Enterprise 64 bit, pretty sure running on
VMware.
Processor: Intel(R) Xeon(R) CPU E7-4860 @ 2.27GHz 2.26 GHz (2
processors)
Installed memory: 4GB
Two processors (I'm assuming this is exposed in the VM as 2 cores) may or may not be under-provisioned. The cores assigned to a VM aren't necessarily mapped directly to physical cores (or even allowed to use 100% of a single core when it's needed!), so you may find this is a more flexible resource than memory. Without any more information about your workload or hardware/virtualization configuration, I would say increasing this to 4 would be nice-to-have.
Memory allocation. Oh boy. This is grossly under-provisioned for the workload. Windows itself needs a bare minimum of 2-3 GB to stay happy, and each of the 2 users running BIDS on the box will require at least 500 MB each. And with that, the box is maxed out already, and I didn't even start figuring out how much the database is going to need.
Majority of users interact with database through asp.net website and Report server website.
You didn't say, but if these are running on the same box, memory requirements for them need to be taken into account as well.
Finally, we have a fairly involved data warehousing operation that probably brings in 3 million records per day by way of SSIS packages that also run on the server.
Assuming this runs at night when there are no live users on the system, I don't see this as a problem unless it's taking too long to run. This part of things is the least of your worries; live users are more important.
Our previous requests for additional memory has been denied with the common response that we need to perform more query optimizations.
As I demonstrated above, the current amount of memory that's been provisioned is completely inadequate. At the same time, though, at the other end of the spectrum, it's exceedingly unlikely you'll be able to get enough memory provisioned to be able to keep the entire database in memory at once.
Even though you got a blanket response like that (which, by the way, probably had more to do with how persuasive your justification for additional resources was, and not the actual resource usage itself), it's highly likely the efficiency of the database could be improved. Yet there's no amount of tuning alone that can fix the issues you're experiencing now; the suggestion of that is a complete non-starter to me.
I would take the overall approach that the amount of memory currently provisioned is below the minimum required (which should be corrected ASAP), and additional resources may be required to improve the user experience to a usable level while improvements are made to increase the efficiency of the systems.
Here are a few thoughts (in order of attack):
You will win if you can prove how much performance improves every time you get more resources provisioned. Keep track of performance metrics using Performance Monitor logging (note: the logging part is very important), including website response times if you can. Start doing this now, before doing anything else. When you do finally get to the minimum amount of memory (you aren't going to get 32 GB right away), suddenly you now have evidence that the added memory improved things... which means adding even more would probably help, too! If you don't collect a baseline on the current configuration, you're going to miss the boat when things are bumped up to the minimum recommended level.
Analyze your server's wait statistics. This will tell you what the biggest bottleneck in the system is. You'll probably have PAGEIOLATCH_XX
as the most common/highest wait time, which indicates too much I/O is being done to fetch pages from disk. This can be alleviated by adding memory, so the physical I/O's become less frequent as the needed data is already in memory. While this analysis is pretty much a foregone conclusion, the fact you've gathered these stats at all gives you more ammo when justifying the need for resources.
As I mentioned above, the bare minimum requirement for memory is not being met. Collect the set of recommended hardware requirements for all the software you're running, and maybe also grab screenshots of Task Manager. This alone should be enough to justify at least 4-8 GB more, on the spot. If they still refuse, try to convince them to allow you to try it out for a week, and give it back after that (you're collecting performance stats, so you won't need to give it back because mid-week you'll be able to prove how much it's improved the situation). If they still refuse, you're being set up to fail; URLT.
If you can offload some of the workload (in particular, avoid remoting in if at all possible), this will increase the amount of memory available for the database, which is more critical.
You won't be able to fit the entire database in memory at once, which means you need to set SQL Server's max memory setting very carefully to prevent memory over-commit, which kills performance like nothing else. Over-commit is actually even worse than simply not being able to fit all the data in memory. It's highly likely you're in this scenario right now simply because there's just no memory available at all, and it's probable that the max memory setting is set to the default (unlimited).
Since you're running SQL Server Enterprise Edition, and memory is at a premium, I would strongly consider implementing data compression. This will trade off an increase in CPU usage for space-savings of memory (and hence reduced disk accesses, which are comparatively very slow).
Tune the database. It's likely the structures and queries could use improvements as far as indexing and access patterns go. Also, if a lot of data is being frequently scanned and aggregated, creating indexed views, summary tables, or precomputed reports may be very helpful.
This might be a longshot because it probably means more hardware provisioning, but implement a caching solution. The fastest query is the one you never make.
Those are just a few ideas. The bottom line is that tuning alone will not solve the problems here, nor will hardware alone, even though the latter probably will alleviate the majority of the immediate issues. That's really how it goes: throw hardware at the problem in the short-term to put out the fire, and throw tuning at the problem in the long-term to fix the root cause as best you can.
Best Answer
It's going to be a challenge to apply a uniform policy across such a varied estate. My first port of call would be to understand the biggest consumers of key resources (CPU, memory, IO) on each instance, with a view to isolating the problematic databases from the others. Glenn Berrys healthcheck DMVs would be a good starting point.
Who's using most of the memory (buffer pool) on each instance?
How long are pages staying in the buffer pool?
Who's generating most IO?
Which databases are consuming the most CPU and which particular procedures?