Deadlocks and blocking locks are two different concepts that you need to understand.
A deadlock is a situation where process/action 1 is waiting for process/action 2 to finish and at the same time process/action 2 is waiting for process/action 1 to finish. In other words. They would wait forever since they are waiting on each other.
In your scenario, something else is happening:
Process 1 is doing an action and has taken a lock on a resource to complete that action, Process 2 now wants to start a action that requires a lock on the same resource. Process 2 now has to wait for process 1 to complete and the lock is released. The key here is that at any given moment, none of the processes are waiting for each other (at the same time). One process is just waiting for the other process to finish an action on the same resource. They are not waiting for each other.
I hope that's clear.
On to how we fix your issue:
Can you post the Table definition, the indexes on the table and the delete select statement. We could have a look to see if there are ways to make the likelyhood of blocking locks less.
On the face of it, this looks like a classic lookup deadlock. The essential ingredients for this deadlock pattern are:
- a
SELECT
query that uses a non-covering nonclustered index with a Key Lookup
- an
INSERT
query that modifies the clustered index and then the nonclustered index
The SELECT
accesses the nonclustered index first, then the clustered index.
The INSERT
access the clustered index first, then the nonclustered index. Accessing the same resources in a different order acquiring incompatible locks is a great way to 'achieve' a deadlock of course.
In this case, the SELECT
query is:
...and the INSERT
query is:
Notice the green highlighted non-clustered indexes maintenance.
We would need to see the serial version of the SELECT
plan in case it is very different from the parallel version, but as Jonathan Kehayias notes in his guide to Handling Deadlocks, this particular deadlock pattern is very sensitive to timing and internal query execution implementation details. This type of deadlock often comes and goes without an obvious external reason.
Given access to the system concerned, and suitable permissions, I am certain we could eventually work out exactly why the deadlock occurs with the parallel plan but not the serial (assuming the same general shape). Potential lines of enquiry include checking for optimized nested loops and/or prefetching - both of which can internally escalate the isolation level to REPEATABLE READ
for the duration of the statement. It is also possible that some feature of parallel index seek range assignment contributes to the issue. If the serial plan becomes available, I might spend some time looking into the details further, as it is potentially interesting.
The usual solution for this type of deadlocking is to make the index covering, though the number of columns in this case might make that impractical (and besides, we are not supposed to mess with such things on SharePoint, I am told). Ultimately, the recommendation for serial-only plans when using SharePoint is there for a reason (though not necessarily a good one, when it comes right down to it). If the change in cost threshold for parallelism fixes the issue for the moment, this is good. Longer term, I would probably look to separate the workloads, perhaps using Resource Governor so that SharePoint internal queries get the desired MAXDOP 1
behaviour and the other application is able to use parallelism.
The question of exchanges appearing in the deadlock trace seems a red herring to me; simply a consequence of the independent threads owning resources which technically must appear in the tree. I cannot see anything to suggest that the exchanges themselves are contributing directly to the deadlocking issue.
Best Answer
If you are only accessing your database, it is highly unlikely that deadlocks you are experiencing are caused by other customers that happen to have their database on the same server - they'd have to be intentionally crossing database boundaries with transactions, and they'd most certainly not have the ability to do that. However we need a lot more information about what kind of deadlocks you are seeing, as they could involve system objects, etc. Are the objects involved only in your database? Have you looked at the deadlock graphs at all, or are you just responding to "deadlock victim" messages? Lots of info in this StackOverflow question about interpreting deadlocks, but you need to capture the graph using Profiler or some kind of monitoring tool (for which you may need to ask the host's assistance).