What you're seeing is blocking as a result of the DBCC SHRINKDB
operation. That is causing your database not to respond. Shrink operations move your data pages around on disk in order to create space within the data files to shrink. As a result, the engine must take either page or table locks in order to move the appropriate data pages. While these should be short term, they can block queries.
Additionally, shrink operations are file manipulations (just as adding, modifying, or removing data files) and can not run while a backup of any sort is executing. This is why you see your other error regarding file commands must be serialized.
To observe the blocking, you can run sp_who2 (or better yet, sp_whoisactive) while your DBCC SHRINKDB
is running. Identify the spid the command is running under and you should see it causing blocking (if there's enough activity).
Unfortunately, there's no way to avoid the blocking. This, along with other reasons, is why you should not perform any sort of automated shrink operation. The best plan for resolution is to remove any "Shrink Database" tasks from your maintenance plans.
Best practices are to avoid shrinking databases whenever possible, per Paul Randall's excellent article that was linked by JohnS. If you must shrink your files, you should use the following rules of thumb:
- Perform shrinking manually so you can monitor it's progress and manage as necessary. If it is interrupting user processes, you can kill the shrink with minimal impact. Interrupting a shrink will not "rollback" any of the page movement and you will retain any progress the shrink has already made.
- Keep your shrink increments to small values so as to minimize page movement and keep your maintenance/shrink period manageable.
Well, mystery solved.
I had previously posted this at another site (SqlServerCentral.com), and then posted here after I had not gotten a response for several hours. While I was trying Aaron's suggestion here, a Josh posted the following there:
Did you check the history of the job to see how it was invoked?
Usually you'll see something like "The job succeeded. The Job was
invoked by Schedule XXXX (Schedule_X ). The last step to run was step
1 (XXXXXXXXXXXXXX)."
Does it look the same for each run?
Though I had previously scoured the Job's history, I had not specifically looked for that message. And lo and behold, the unscheduled ones have the following:
... The Job was invoked by Alert 1. ...
So someone must have tagged this job to run after an Alert (likely the wrong job).
Naturally now I feel silly for not checking this before. But I post this here now, so that others in the future may benefit from my oversight.
Best Answer
You haven't done anything wrong or gone crazy - it's a known bug and we've experienced the same thing.
Our process was changed that we would script out job status and reapply after any maintenance plan work - until we could move off of maintenance plans entirely (which we did).
Continuing on from the comments on the main question, there is an owner associated with the package itself which then gets copied over to the jobs and there's no UI to change it (you can modify it directly in the MSDB package tables though!)
We remediated that later on but I am unaware if there's a link between that referenced login disappearing and the jobs being disabled. It's possible.