Azure SQL Database – Delete Query Uses 100% CPU

azureazure-sql-database

I have a relatively small database, maybe a few hundred thousand rows, and we have a stored procedure that is meant to clear out a 'Dealer' (the application aggregate root) and all of it's related data. The database lives on a v12 Azure SQL instance, and when it hits one particular table, it consumes 100% of the instance's DTU/CPU and takes a really long time to finish.

I believe the issue is that the table has a ton of foreign keys, probably about 30. Looking at the execution plan you can see that it's doing a ton Nested Loop with index scans to find dependent rows. All of these rows were already delete prior to the attempt to delete the rows from this particular table, so these loops are actually pointless.

My question is, would it be more efficient to disable the foreign key constraints on these dependent tables, do the delete, and then re-enable to constraints? Would there be some negative effect to doing that, aside from the fact the it's technically possible that some bad data gets inserted while the constraints are disabled. Or is there an even better way to accomplish what I want to accomplish?

UPDATES
This is a v12 Azure SQL database, it is not full blown SQL Server 2012 in a VM. I am not batching any deletes, and the offending statement is part of a larger SPROC. I identified this particular table delete by tearing the sproc apart and running bits and pieces of it separately. When it got to this table, it took literally 15 minutes to complete. It will affect on the order of hundreds of rows, varying depending on the state of the particular 'Dealer' we're clearing out.

We cannot do soft deletes in this case because usually we're clearing this dealer out so it can be replaced (soft deletes would cause duplication and would require a lot of code to work around). I have investigated other types of blocking and wait states and found nothing else that seemed to be negatively affecting this. As I mentioned this database is relatively small, and it's mostly used for very simple CRUD operations.

Here's is the execution plan, it's huge.
DELETE FROM Chemical Execution plan

The XML version of the execution plan can be found here: https://gist.github.com/CodingGorilla/6cf7a87df9257d5f93e0d545af9839c2

Best Answer

As you note the plan shows many index scans[1], a couple of the indexes multiple times such as Facilities.IX_CustomerID. That will be why is it spinning the CPU resource: those indexes will be in the buffer pool so little IO is needed and the CPUs are being made to chew them, probably in their entirety, repeatedly.

If this is from a simple DELETE statement and all the other activity is the engine enforcing foreign key constraints then I would check for foreign keys without supporting indexes. For instance one of the scans of [Facilities]" Index="[IX_CustomerID] has the search predicate [DataMateWeb].[dbo].[Facilities].[SurfaceCleanerID]=[DataMateWeb].[dbo].[Chemical].[ChemicalID], which suggests an index covering [Facilities].[SurfaceCleanerID] is not present or for some reason can not be used. When you define a foreign key an index is not automatically created to go with it which can surprise people and that would explain a scan of the table/cluster rather than multiple seeks on an index. In any cases where those indexes are the supporting indexes for the foreign keys, make sure that they are correctly defined and don't have badly out-of-date statistics.

If the delete operation isn't that simple then make sure any joins and sub-queries used in deciding which rows will be deleted are supported by appropriate indexes and that the joining and filtering predicates are arranged such that they are sargable where ever possible.

Failing the above (i.e. all the FKs and any other filtering & joining activity all look fine with regard to having indexes they could potentially seek on), the next straw I'd clutch for is: if you are deleting many rows at once (DELETE WHERE IN (<a result that could be a long list>)) then I've seen that make SQL Server scan the index instead of using multiple seeks. This feels a bit nasty[2] but: check this by trying to confirm the behaviour with just one row and if that changes at least some of the scans to seeks (and improves performance as a result) consider cursoring through the rows that need to be removed and doing them one at a time.

[1] Which as they are clustered index scans are effectively table scans which for large tables means a lot of data is being touched.
[2] Because it is nasty. But if it is nasty and works, it still works!