Ideally, for an operational production database, you don't want developers having any access to the server or any database on it at all. That sort of thing is one of the first things you'll have to do for SOX compliance.
For the sort of rights that userIDs run under, the only rights they really should have are db_datareader
, db_datawriter
and explicit GRANT EXECUTE ON x TO y
(for each stored proc and user defined function x
for the userID y
).
If you need to be running traces in production, you have some problems and it will take a Great Wall of Textâ„¢ to explain it all. My first recommendation is to have a QA environment that is locked down just like production and if traces need to be run, restore a back-up of the prod db to QA and run the traces there. Again, if you have SOX, HIPAA or PCI-DSS requirements, then you better sanitize the prod data before restoring it to QA.
We currently have a windows group for DBAs that has rights to all of our servers and databases.
Give them logon and view data rights; however to perform DBAly duties, use a separate login with elevated privileges. I know one financial customer that does this - the regular windows authentication based logins were limited in the damage they could inadvertantly do. Restores and running DML required running with the separate SQL authentication login.
One government agency I worked with used 2 separate logins for each server/db admin. So if Tangurena
was my domain login (this login would have regular User
privileges), then TangurenaAdmin
would be my separate Administrator
login. You get into trouble if you use your admin account all the time, but then it lacks permissions to other things (like no email. Oh, you say that like it is a bad thing...).
The current government agency I'm working with has each server/db admins having privileges elevated above the standard user, but not quite admin (think of it as the PowerUser
group). Domain admin functions are performed with a shared domain admin account.
A common error is restoring the wrong database (like QA restored over the production server), and this isn't going to be solved via restricted rights or multiple logins. Doing potentially destructive things in pairs is one way to minimize the risks.
I'm guessing we'd need elevated privs to run traces as sa
No. You only need ALTER TRACE permissions:
http://msdn.microsoft.com/en-us/library/ms187611.aspx
Is it possible that this frequency of spills could be a primary culprit in our high tempdb write latency?
Yes it is possible, though typically it is the average size of the spills, and how deep they go (i.e. recursive hash spills, multi-pass sorts) that matters more than the frequency per se.
SQL Server provides a wide range of metrics and DMV information to help you troubleshoot the various contributing factors to tempdb pressure, many of which are discussed in the Microsoft Technical Article, "Working with tempdb in SQL Server 2005" (applies to all versions 2005 onward).
You should be able to use the guidance and diagnostic queries contained in that document to start identifying the primary causes of any tempdb pressure. Do not disregard e.g. version store activity simply because ALLOW_SNAPSHOT_ISOLATION
is not enabled. Many features use the version store (e.g. triggers, MARS, RCSI) aside from snapshot isolation.
If sort and hash spills do turn out to be significant at a high level, you will probably need to set up some specific monitoring for this. Depending a little on your SQL Server version, this is not always a straightforward as one might hope. To connect sort and hash spills with the particular query that caused them requires Event Notifications or Extended Events. The SolidQ article, "Identifying and Solving Sort Warnings" contains details and some good general advice about resolving common causes.
You should also work with your storage team to determine how much of the high latency is attributable to your workload, how much comes from other shared uses, and what options there are for reconfiguration. Your analysis of SQL Server's metrics will help inform this discussion, as will any metrics the SAN people are able to provide.
Best Answer
The short answer is that we eventually literally pulled the plug on the SQL server and got away with it, (although I wouldn't necessarily recommend the approach). If you're in a situation like we were with this though, here's the full scenario: