SQL Server – Excessive Memory Grants from Indexed Foreign Key Cascade Delete

memory-grantsql serversql server 2014t-sql

Each day, stores can enter sales information into our OLTP application. The app calls a stored procedure in SQL to save this information. Based on the users's activity, the application sends a code indicating whether the proc should perform an insert, update, or delete.

This save stored procedure is receiving memory grants of 60 GB for deletes to one row. To duplicate the issue, I ran an ad-hoc delete query between a begin tran and rollback and captured the actual plan below:

https://www.brentozar.com/pastetheplan/?id=r189liBI4

The schema is like follows:

Daily_Item_Sales_Headers -- ~100 million rows on this system
========================
DlyItmSlsHdr_Key decimal(15,0) primary key nonclustered,
DlyItmSlsHdr_PaperworkBatch_Key decimal(15,0), -- FK to parent batch of data that contains other types of data
UK_DlyItmSlsHdr_PaperworkBatchKey_Key clustered, unique, unique key located on PRIMARY (DlyItmSlsHdr_PaperworkBatch_Key, DlyItmSlsHdr_Key)

Daily_Item_Sales -- ~790 million rows on this system
================
DlyItmSls_Key decimal(15,0) primary key nonclustered,
DlyItmSls_DlyItmSlsHdr_Key decimal(15,0), -- FK to header table, cascade delete,
UK_DlyItmSls_DlyItmSlsHdrKey_Key clustered unique constraint on (DlyItmSls_DlyItmSlsHdr_Key, DlyItmSls_Key)
[columns about sales data]

The query I ran is simple:

delete Daily_Item_Sales_Headers where DlyItmSlsHdr_Key = 1

The plan shows the header deletion is correctly cascading to the child sales rows. The plan also shows an index seek on the clustered index of the child table. However, this clustered index seek to the child table has an estimated 790 million rows. The actual number of rows is ~100. The high estimated rows is causing a memory grant of ~60 GB.

Using dbcc show_statistics on the child table indexes, I was able to see that the statistics were updated last night with a 2% sample size. The histogram shows between 1 and ~33,000 rows estimated per parent key. So the statistics appear to show the estimate should be much lower.

Why is this delete query generating such a large memory grant?

I saw this question about excessive sort memory grants that appear to be caused by a bug, but it looks different to me, because there are no sorts in this plan. Maybe it is the same bug applying to the table spools when cascading to the child table?

Excessive sort memory grant

Because of the foreign key cascading, I do not think I can work around the memory grant by deleting the child rows first before deleting the parent rows. This is an OLTP system with up to 10,000 stores working at once, so I cannot drop the foreign keys on demand for a single delete.

EDIT 2/28/2019 1:13 PM CST

The SQL instance has about 400 GB memory allocated to it.

The application has the following trace flags enabled:

1222: deadlock tracing
4199: query processor fixes
2312: use 2014 cardinality estimator
2453: @Table variable cardinality

Disabling trace flags yields different estimates for the child table cascade delete:

Trace Flag 2312    Trace Flag 4199   Row Estimate
===============    ===============   =============
      on                 on            790 million rows
      on                 off           608 rows (very accurate)
      off                on            1 row
      off                off           1 row

Adding a querytraceon for the flag 9130 mentioned in the linked question makes no difference.

A coworker found this interesting article about a memory bug in SQL 2014. The linked resolution was to add option (MAX_GRANT_PERCENT = 1) to the query.

https://www.theregister.co.uk/2016/02/09/microsoft_sql_server_2014_bug/

EDIT

The exact SQL Server version is SQL Server 2014 SP2 CU 12.

The database is compatibility level 110 – SQL Server 2012. We are unable to change the compatibility level for a while.

Best Answer

I believe you're encountering the same bug in the new Cardinality Estimator that I did. I wrote about the problem here, with a more-detailed look here.

In short, the new Cardinality Estimator has a bug when estimating rows to be deleted on a cascading delete. If the driving value on the parent table is far enough outside its histogram, the CE will assume that every single row of the child table gets deleted.

Lucky you though, my server took the high estimates and decided to start scanning instead :(

Also luckily for you, you're dealing with a proc instead of ORM madness, so there are viable solutions. One would be to add DBCC TRACEOFF( 4199) to your procedure (potentially causing permissions issues). That doesn't future-proof you though, since in 2016+ 4199 status doesn't matter for the CE bug. Another option might be to add OPTION(OPTIMIZE FOR (@1 = <some middle value>)).

Since you're not getting a scanning plan, you could also just strongarm the memory grant with MAX_GRANT_PERCENT.

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

To answer your main question directly, the sorts are there to present rows to update operators (performing deletions in this case) in index key order. The principle at work here is that sorting on the keys will promote sequential access to the index.

This can be a good optimization, though the details depend on your hardware, how likely the affected pages are to be in memory, and whether the sorts can complete within the memory allocated to them. When the optimizer decides the cost of sorting will be paid back by the increased efficiencies associated with sequential index access, it sets a property DMLRequestSort on the update operator:

DML Request Sort

The optimizer may also decide to split the update into separate operators to maintain the clustered index (or heap) and then the nonclustered indexes. often, it will decide to sort more than once - first for the clustered index keys, and then again for the nonclustered index(es). Again, where sorting is considered optimal, each index update operator will have the DMLRequestSort property set to true.

All that said, the things I would fix first would be to eliminate the index scans where the join operator they feed is a nested loops join, and to remove the eager index spools, which are inserting rows into an empty index every time the query is executed. An eager index spool is often the clearest possible sign that you are missing a useful permanent index. The seek predicate in the index spool operator identifies the keys the optimizer would like an index on.

Examples of tables that are missing a nonclustered index (requiring an eager index spool) are:

child6gc8Selections
gc9s
child7s
gc6s

eager index spools

Examples of tables that are currently being scanned below a nested loops join are:

child1
parentObjectMessages
child8s
child7s
child6s
child5s
child4s
child3s
child2s

scan below nested loops

Taking the example shown above, the Clustered Index Scan has an output list of Id, parentObjectId, the Nested Loops Join predicate is child7s.parentObjectId = parentObject.Id, and the join output column list is child7s.Id.

From that information, it seems a good access method (index) on child7s for this part of the query would be keyed on parentObjectId with Id as an included column. You should be able to work out how best to work this into your existing indexing strategy.

The following are examples of tables where the optimizer is currently choosing a hash join. I would check tables like this to ensure that is a reasonable access method:

child6gc8Selections
gc2s
gc5s
gc6Properties

hash joins

The table child2bigChild also participates in a merge join where an explicit sort is necessary. Again, I would check to see if this sort could be avoided.

sort before merge

Once the basic indexing issues are resolved, we can look at other optimizations if necessary.

SQL Server – Strange Results for DELETE with ROW_NUMBER

This is expected behaviour at the moment

the function gets evaluated on the DELETE stream.

So it actually behaves like this (pseudo code)

DELETE k
OUTPUT Deleted.name, 
       ROW_NUMBER() OVER (ORDER BY Deleted.object_id) as r
FROM (
    SELECT k.*, ROW_NUMBER() OVER (ORDER BY object_id) r
    FROM #o k
) k
WHERE r <> 1 --OUTPUT returns rows with (r = 1)

Although this is the currently defined expected behaviour it isn't really reasonable and they say

Long term, we need to actually fix the behavior of OUTPUT clause to match that of the ANSI SQL standard which will result in change of results. So we will look at the correct semantics for a future version of SQL Server since there might be apps that rely on the current behavior.

I haven't tested on SQL Server 2014 but on 2012 the plan looks as the below.

enter image description here

After the delete operator (to the left) the column values from the deleted rows are sorted back into object_id order and the row_number re-applied.

It looks like the same is happening in your case from the results. (temp tables have a negative id and are sorted first before sysrscols which has a low positive object_id of 3).

As well as the dubious semantics of the result the second sort by object_id doesn't seem strictly necessary in this plan as it looks likely that they will already be sorted in that order in any event.

Regarding workarounds for this specific case changing the output clause to OUTPUT Deleted.name, 1 + Deleted.r AS r would work.

For more complicated WHERE clauses I think you'd need a pass to calculate the row_number and then a join. e.g.

ALTER TABLE #o
  ADD CONSTRAINT PK PRIMARY KEY (object_id);

WITH k AS
(
 SELECT *, 
        ROW_NUMBER() OVER (ORDER BY object_id) r
FROM #o
)
MERGE k AS k1
using k AS k2
ON k1.object_id = k2.object_id
WHEN matched AND k2.r <> 1 THEN
  DELETE
OUTPUT Deleted.name,
       k2.r;

enter image description here

Results

enter image description here

Best Answer

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

SQL Server – Strange Results for DELETE with ROW_NUMBER

Related Question